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Abstract. There have been a number of papers written on semi-parametric estimation methods 
of the long-memory exponent of a time series, some applied, others theoretical. Some using Fourier 
methods, others using a wavelet-based technique. In this paper, we compare the Fourier and 
wavelet approaches to the local regression method and to the local Whittle method. We provide 
an overview of these methods, describe what has been done, indicate the available results and 
the conditions under which they hold. We discuss their relative strengths and weaknesses both 
from a practical and a theoretical perspective. We also include a simulation-based comparison. 
The software written to support this work is available on demand and we illustrate its use at the 
end of the paper. 
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1. Introduction 

We study here finite variance stocliastic processes {Xk}k>i, whose spectral density /(A), A G 
(— TT, vr) behaves like a power function at low frequencies, that is as |A|~^'^ as the frequency A — > 0+. 
The case d > corresponds to long-memory, d = to short-memory and d < is often referred 
to as negative dependence. For Xk,k G Z to be stationary it is necessary that J^^ f{X)dX < oo 
and hence that d < 1/2. We relax these restrictions in a number of ways. We shall allow the 
process to be non-stationary, requiring only that it becomes stationary after it is differenced a 
number of times. We also suppose that the spectral density (of the differenced process) behaves 
not merely like |A|~^'^ but as | A|~^'^/*(A), where /* is regarded as a short-range density function. 

Our goal is to estimate d in the presence of /*. We shall not assume that the nuisance function 
/* is known, nor that it is characterized by a finite number of unknown parameters, but merely 
that /*(A) is "smooth" in the neighborhood of A = 0, so that if one focuses only on frequencies 
A that are sufficiently low, then the spectral density /(A) behaves essentially like |A|~^'^. What 
frequency cut-off should one choose will clearly become an important issue. 

The estimation framework is semi-parametric: we must estimate the unknown parameter d 
while viewing the presence of /* as a nuisance, albeit one which complicates matters. The 
estimation method will also be local, in that, it is necessary to focus only on frequencies A that 
are close enough to the origin, where the influence of /*(A) can be neglected. 

In this paper we provide an overview and comparison of four semi-parametric estimation meth- 
ods of the parameter d which have all proven to be very effective. Two of them are Fourier-based, 
the other two are based on wavelets. The methods are: 



• Geweke- Porter Hudak (GPH): Regression / Fourier, 

• Local Whittle Fourier (LWF): Whittle / Fourier, 

• Local Regression Wavelets (LRW): Regression / Wavelets, 

• Local Whittle Wavelets (LWW): Whittle / Wavelets. 



The Fourier methods are older and better kno wn. They have es se ntially been devel oped by Peter 
Robinson in a number of fundamental papers iRobinsonI il995bl ). iRobinsonI ilQQSal l. If we ignore 
for the moment the presence of the nuisance function /*, then one has /(A) = lAj"^"^, that is 
log /(A) ~ — 2dlog|A|. Therefore, d can be estimated by linear regression on the periodog ram. 
This is the Fourier-based regression method considered in iGeweke and Porter-HudakI (jl983l ) in a 
pa rametric setting. The semi-parametric setting was suggested bv iKiinschI (jl987l ) and developed 
bv IRobinsonI (Il995bl). The Fourier-b ased Whittle method is a pseudo-maximum likelihood method 
developed bv IFox and TaqquI (jl986l ) in a parametric setting and extended in a semi-parametric 
setting by 



Robinson 



(1995a 



)• 
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The papers of 



Moulines et al 



wm), 



Moulines et al. 



(|2007a|), 



Moulines et al 



(l2007d) and 



RouefF and Taqgd (j2007l ) recast the preceding Fourier-based methods in a wavelet setting. Wavelets 
have a number of advantages. They allow differencing implicitly and therefore they can be used 
without problems when d > 1/2. They also automatic ally discount polynomial trends. The local 



Abrv et al.l ([2000!) and lAbrv et al 



ated; see also 



wavelet-based regression method was first developed bv lAbrv and VeitchI (1199811 under the s impli - 
fying assumption that the wavelet c oeffici ents are uncorre 
and the review articles 



Veitch and Abrvl (119991) 



(I2OO3I ). Il l addition, se e 



([2003) for the automatic select ion of the cut- off fr equency point and 



Veitch et al 



choice of the " scale function" . 



Veitch et al 
(l20o3) for the 



Bardetl ( 200C) and Bardetl ( 20021 ') provides asymptotic result for 



the LRW estimator in a parametric context. 



Bardet et al 



(I20(im is a first attempt to analyze the 



behavior of LRW in a semi-parametric contex t by assuming continu ous-time observations. The 



Local Whittle wavelet method is developed in 



Moulines et al 



(|2007d). 



The paper is structured as follow. In Section [2l we formalise our assumptions on {Xf.} by 
defining an M{d) process, that is, a process with memory parameter d. The standard ARIMA 
and fractional Gaussian noise examples are introduced in Section [3l The wavelet-based semi- 
parametric estimators are defined in Section S] and the Fourier estimators in Section [5l The 
semi-parametric setting is discussed in Section [6l The asymptotic properties of the wavelet 
and Fourier semi-parametric estimators are described in Sections [7] and [HI respectively. Their 
properties are discussed further in Section [9l Section [10] contains the Monte-Carlo study which 
compares the effectiveness of the four methods. In Section [TTl we illustrate the use of the software 
written in support of this work. This software may be obtained from the authors. Section [12] 
contains concluding remarks. 



2. Definition of an M{d) process 
Let X =^ {Xk}kez be a real-valued process, not necessarily stationary. Its first order difference 

is 

[AX]n=Xn-Xn-l, UGZ 

Its K-th order difference A.^ X is defined recursively. We suppose that the process X has memory 
parameter d, d G M, in short, is an M(d) process. We shall first define this notion for a stationary 
process X, where d < 1/2, and then provide a general definition for d G M. 

Let /* be a non-negative even function continuous and positive at the origin. A stationary 
process X is said to have memory parameter d, —00 < d < 1/2, and short-range density function 
/*, if its spectral density is given by 

fxiX) = \l-e-'^\-'''nX), AG(-7r,7r), (1) 
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To allow d > 1/2, we consider non-stationary processes X and extend the preceding definition, 
valid for stationary processes, in the following way. 

Definition 1. We say that X has memory parameter d, d S M (in short, an M(d) process), and 
short-range density function /*, if f* is continuous and positive at the origin and, for any integer 
K > d — 1/2, its K-th order difference A.^ X is stationary with spectral density function 



11 



-'^|2(^-'^)r(A), AG(-^,7r). 



(2) 



Observe that © is integrable since —{K — d) < 1/2. Observe also that if the 

process X is as in Definition [H then while A.^X is stationary, the process X itself is stationary 
only when d < 1/2. Nevertheless, one can associate to X the generalized spectral density function 



fxiX) 



11 



-iA|-2d 



r(A) 



(3) 



Remark 1. This definition of M{d) processes was proposed by iHurvich and Rayi (119951 ). It has 
the advantage that A.^X is stationary, but it introduces a discontinuity at the fractional points 
d = 1/2, 3/2, . . . since f i^kx is quite different at these values of d. In empirical work, there are typ- 
ically no inherent restrictions on the value of the memory parameter d, and this may cause a prob- 
lems if the degree of integer differencing required to achieve stationari ty must be gues sed in ad- 
vance. An alternative defin ition of M(d) p ro cess has been introduced by 



Robinson (1994) and later 



used b y some authors (see iTanakal (j 19991 ) , IShimotsu and PhillipsI (|2005l ) , IShimotsu and Phillips 

liS)). 



The memory parameter d plays a central role in the definition of M((i) processes because it 
characterizes the behavior of the generalized spectral density /x(A) at low frequencies. Indeed, 
assuming that /* is continuous at zero, then ([3]) implies /jc (A) ~ |A|~^'^/*(0) as A ^ 0. Allowing 
d to take non integer values produces a fundamental change in the correlation structure of a 
fractional process, as compared to the correlation structure of a standard time-series model, such 
as an ARMA(p, q) process. 

The study of M(d) processes has recently attracted attention amongst theorists and empirical 
researchers. In applied econometric work, M((i) processes with d > provide sensible models for 
certain macroeconomic time series (inflation, interest rates, ...) as well as certain financial time 
series (volatility of financial asset returns, forward exchange market premia,...). M((i) models 
encompass both stationary and nonstationary processes depending on the value of the memory 
parameter and include both short-memory series M(0) and unit-root M(l) processes as special 
cases when the memory parameter takes on the values zero and unity. 
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3. Examples 

Stationarity of the increments is commonly assumed in time-series analysis. In ARIMA models, 
for example, ([2]) holds with d = K integer and with /* equal to the spectral density of an 
autoregressive moving average short-memory process. If (i G M and /* = cr^ in (l3|), one gets the 
so-called fractionally integrated white noise process, ARFIMA(0,(i,0). The choice d G M and 

/arma(A) = ^^ |^"£^=^^'' "J.^ , AG(-7r,7r), (4) 

with 1 - Y.k=i ^kZ^ foi' l-^l = 1 and 1 - YX=\ / (so that /arma(O) / 0) leads to the 
class of ARFIMA(p, d, q) processes. 

Another example is {BH{k)}k£Zj a discrete-time version of fractional Brownian motion (FBM) 
{BH{t),t G M} with Hurst index H G (0,1). The latter is a centered Gaussian process with 
covariance 

RH{t, s) E[BH{t)BH{s)] = l{\tr + - \t - sD . 
The process {-B_H'(/i;)}fcgz is increment sta tionary (K = 1) and its generaliz ed spectral density is 



given up to a multiplicative constant (see ISamorodnitskv and TaqquI (|1994| )) by 

oo 

/fbm(A)= Y1 + AG(-^,7r). 

fc=— oo 

We can express it in the form ([3]), 

/FBM(A) = |l-e-'^|-2V^BM(A), (5) 
by setting d = H + 1/2 e (1/2,3/2) and 

/fbm(A) 



2sin(A/2)|2^+^ 



A 



+ |2sin(A/2)|2^+i J^|A + 2H~'''"' • (6) 

kj^O 

Observe that /fbm('-') ~ ^ that it is bounded on (— vr, vr). 

The process Gh = ^Bh is fractional Gaussian noise (FGN). It is a stationary Gaussian process 
with spectral density proportional to ([5]), but with d = H — 1/2 & (—1/2, 1/2). 

4. Wavelet semi-parametric estimators of the memory parameter 

In this section, we introduce the wavelet setting and, based on heuristical arguments, proposed 
possible semi-parametric wavelet estimators. We start with a brief summary of the basic ideas. 
A wavelet ip{t), t G M is a function with at least one vanishing moment, that is ,J^'ip{t)dt = 0, 
and which is low-pass, in the sense that its Fourier transform V'(0 decreases as ^ oo. We then 
define the scaled and translated versions of V, namely Vi,fc(i) = 2-^/^ip{2-H - k), j, k e Z. The 
scale index j dilates so that large values of j correspond to coarse scales (low frequencies) , while 
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the position index k translates the function ij){2~H) to Tp{2~H — k). The corresponding wavelet 
coefficients are then defined as Wj^k = J^^{'t)'4^j,k{t)'it and are used to estimate d. Because 'ip 
is low-pass, ipj^k concentrates in the low frequency region as j ^ oo and fxW "scales" at low 
frequencies since |1 — ^ |^|-2d |^| _^ q j^^ ^^le above definition of wavelet coefficients, 

we supposed, for simplicity, that the process {X(t)}t£R is defined in continuous time and that the 
integral above is well-defined. This definition can be adapted to discrete time series {Xk, k £ Z} 
by using a scale function and also to finite samples Xi , . . . , Xn by merely restricting the set of 
scale and translation indices of available wavelet coefficients. This is in sharp contrast to Fourier 
analysis, where the definition of discrete Fourier coefficients at a given frequency changes as the 
sample length increases. We now turn to a more formal presentation. 



4.1. The wavelet setting. The wavelet setting involves a scale function (j) G L^(]R) and a wavelet 
■0 G L^(M), with associated Fourier transforms 



def 



(/>(t)e-'«* dt and i^{0 = H V'(i)e"^' 

> J —oo 



We assume the following: 

(W-1) (f) arid Tp are compactly-supported, integrable, and (j){0) = cl){t) dt = 1 and i^'^it) dt 



(W-2) There exists a > 1 such that sup^gu 1-0(01 (1 + |?|)° < oo. 

(W-3) The function ip has M vanishing moments, i.e. t'^il){f) dt = for all m = 0, . . . , M- 
1 

(W-4) The function X^^.^^ — A;) is a polynomial of degree m for all m = 0, . . . , M — 1. 



Condition (W-2) ensures that the Fourier transform ij: decreases quickly to zero. Daubechies 
wavelets have a > 1 (see Table [1] below) except for Haar wavelet which is discontinuous and 



for which a = 1. Condition (W-3) it ensures that -0 oscillates and that its scalar product with 
continuous-time polynomials up to degree M — 1 vanishes. It is equivalent to asserting that the 
first M — 1 derivative of ijj vanish at the origin and hence 



|V'(A)| = 0(|A 



as A 



0. 



And, by (jCohenl . l2003l . Theorem 2.8.1, Page 90), (W-4) is equivalent to 



sup |0(A + 2kTT) 



0{\X\^^) as A 



0. 



(7) 



(8) 



As shown below, conditions (W-4)|[("W-3) imply that the wavelet transform of discrete-time poly- 
nomials of degree M — 1 vanishes. 
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We now describe the computation of the wavelet coefficients. Define the family {ipj^k^j G A; G 
Z} of translated and dilated functions 

ijj,k{t) = 2-^'/2 -k), j£Z,keZ. (9) 

Consider a real- valued sequence x = {xk, k £ Z}. We need to construct a continuous-time 
process from a discrete-time one. Using the scaling function (f), we first associate to this sequence 
the continuous-time functions 

n 

x„(t) =^ Xk Ht - k) and x(i) =^ Yxk(t>{t-k), t G M . (10) 
fe=i fcez 

The function x„ only requires the values of xi, . . . ,x„ while the function x requires the whole 

sequence {xfc, A; G Z}. Without loss of generality we may suppose that the supports of the scaling 

function (j) and of the wavelet function ip are included in [— T, 0] and [0, T], respectively, for some 

integer T > 1. This implies that x„(t) = x(i) for all t G [0,n — T + 1] and that the support of 

'ipj^k is included in the interval [2-'fc,2-'(/c + T)]. The wavelet coefficient W^^, at scale j > and 

location /c G Z is defined as 

Wf^f, = f x(t)V'i,fe(t) dt = / x„(t)V'j,fc(t) dt, j > 0, A: G Z . (11) 

The second equality holds when [2-^/c, 2^ [k + T)] C [0, n — T + 1], that is, for all (j, k) G T„, where 

Xn = {(i, k) : j >0,0<k< Uj} with nj = [2'^ {n - T + 1) - T + Ij . (12) 

In other words T„ denotes the set of indices {j, k) for which the wavelet coefficients Wj^k depend 
only on xi, . . . ,x„. If the sample size n increases, these wavelet coefficients remain unchanged 
and new ones can be computed. Thus the definition of wavelet coefficients does not depend on the 
sample length, in contrast to Fourier coefficients. The wavelet coefficient Wf'i can be computed 
explicitly by using discrete convolution and downsampling, namely, 

^j^k = ^^1 hj^vk-i = (hj,- * ^)2^k = {V [hj,. * x])fc, j>0,keZ, (13) 
where, for all j > 0, the impulse response hj^. is defined by 

hj^l = 2-^/2 f ^ l)tl;{2~H) dt, leZ, (14) 

J — oo 

where denotes the convolution of discrete sequences and is the j-power downsampling 
operator defined, for any sequence {ck}kez, by (P c)k = 01-23 ■ Since (j) ip have compact 
support, the associated transfer function Hj is a trigonometric polynomial, 

-1 

HM) = E ^i.' = E ^-.^ ■ (15) 

/GZ /=-T(2i+l)+l 
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Under assumption (W-4) , 1 1— > 4'it + l)l"^ is a polynomial of degree m and (W-3) therefore 
implies that, for all j > and all m = 0, . . . , M — 1, 



/oo 



(16) 



Now consider Pj{x) = "^i^ihjjx^ and observe that (fT6]) implies -P;(l) 
0, and hence Hj{\) = Pj{e~^^) factors as 



0, p;(i) = 0, 



Hj{X) = {I - e-'Y' H,{X) , 



(17) 



where Hj{X) is also a trigonometric polynomial. The wavelet coefficient (I13p may therefore be 
computed as 



.,.,-(i^ [/i,,*A-x])fe (18) 

where {hj^i}i^x are the coefficients of the trigonometric polynomial Hj and A^^x is the M-th 
order difference of the sequence x. In other words, the use of a wavelet and a scaling function 



satisfying (W-4) and (W-3) implicitly perform a M-th order differentiation of the time-series. 
Therefore, we may work with an M{d) processes X beyond the stationary regime (d > 1/2) 
possibly contaminated by a polynomial trend of degree K without specific preprocessing, provided 
that d — M < 1/2 and M > K + 1. It is perhaps less known that wavelets can be used with 
non-invertible processes {d < —1/2) thanks to the decay property of 'ip at infinity assumed in 



(w-2: 



4.2. Choice of the wavelets. In this paper, we do not assume that Vj^fc a-re orthonormal in 
L^(M) nor that they are associated to a multiresolution analysis (MRA). We may therefore use 
other convenient choices for (j) and ip as long as (W-l)|[("W-4) are satisfied. A simple choice is for 
instance, for some integer M >2, 



and 



V{t) = CM 



M 



lgt](2i), 



(19) 



where 1^ is the indicator function of the set A and for an integrable function /, /**'^ denotes the 
M-th self-convolution of /, 



fit - u) g{u) du , 



n=f^_^, with if^gm = 

M times 

and Cm is a normalizing positive constant such that 

Scaling and wavelet functions associated to an MRA present two important features: 1) they 
give raise orthonormal L^(M) bases {'i/'j,fc}j 2) a recursive algorithm, the so-called pyramidal 
algorithm, is available for performing the convolution/downsampling operations at all scale j. 
The complexity of this algorithm is 0(n) for a sample of length n, see iMallatI (jl998l ). In other 
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words, in an MRA, the computation (jl3p can be made recursively as j grows and it is not necessary 
to explicitly compute the filters hj^. defined in ([1 



Cohen 



Assumptions (W-l)|[("W-4) are standard in the context of an MRA, see for instance 
(j2003l ). Common wavelets are Daubechies wavelet and Coiflets (for which the scale function 
also has vanishing moments). What matters in the asymptotic theory of wavelet estimators 
presented below is the number of vanishing moments M and the decay exponent a, whi ch both 
deter mine the frequency resolution of -i/;- For standard wavelets, M is always known and ([Cohen, 
20031 . Remark 2.7.1, Page 86) provides a sequence of lower bounds (a^) tending to a as A: —> oo. 



Daubechies wavelet are defined by their number M of vanishing moments, for any M > 1 (the case 
M = 1 correspond s to the so called H aar wavelet). An analytic formula for their decay exponent 



a is available, see (iDaubechiesl. 11993. Eg f 7. 1.23^. Page 225 and the table on Page 226) and note 



that our a equals the a of lDaubechiesI (119921) p l us 1. A simpler lower bound a > (1 — log2(3)/2)M 



holds for Daubechies wavelets, see 



Daubechies! (jl992l ). Although it is not sharp, it shows that the 



number of vanishing moments M and decay exponent a of Daubechies's wavelets can be made 
arbitrarily large at the same time. Table [T] provides some values of a for Daubechies wavelets and 
the lower bound ak with k = 10 for Coifiets with given number of vanishing moments M ranging 
from 2 to 10. 



M 


2 


3 


4 


5 


6 


7 


8 


9 


10 


a (DB) 


1.3390 


1.6360 


1.9125 


2.1766 


2.4322 


2.6817 


2.9265 


3.1676 


3.4057 


aio (Coif.) 


1.6196 


N.A. 


1.9814 


N.A. 


2.5374 


N.A. 


3.0648 


N.A. 


3.5744 



Table 1. The decay exponent a or its lower bound aio of \4>{£,)\ (and hence of 
IV'(OI) with M vanishing moments. First line: M; second line: a for Daubechies 
wavelet; third line: the lower bound aio for the Coifiet. N.A. stands for not 
available (Coiflets are defined for M even). 



In view of Table [H one can observe that the decays of Coiflets are slightly faster than the ones 
of Daubechies for given M's. On the other hand the Daubechies wavelets have shorter support, 
since it is of length T = 2M while it is of length T = 3M for Coiflets. The support length impacts 
on the number of available wavelet coefficients: given a sample size n, the greater the support 
length T the smaller the cardinality of the set In defined in (jl2p . 

We should also mention the so-called Shannon wavelet ■05' whose Fourier transform ^|Js satisfies 

= 



1 for |^|G[7r,27r] 

^ ^ (20) 

otherwise. 
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This wavelet satisfies (W-2) - (W-4) for arbitrary large M and a but does not have compact 
support, hence it does not satisfy (W-1) We may therefore not choose this wavelet in our 
analysis. It is of interest, however, because it gives a rough idea of what happens when a and M 



are large since one can always construct a wavelet satisfying (W-1) - (W-4) which is arbitrarily 
close to the Shannon wavelet. 

4.3. The local regression wavelet (LRW) estimator of d. For any integers n, jo a-nd Ji; 
io ^ ji ; the set of all available wavelet coefficients from n observations Xi , . . . , Xn having scale 
indices between jo and ji is 



InihJi) =^ {{j, k) : jo < j < ji, < A; < nj} 



where rij is given in (jl2p . Consider two integers L < U satisfying 



< L < U < Jn'^= max{j : rij > 1} 



(21) 



(22) 



The index J„ is the maximal available scale index for the sample size n; L and U will denote, 
respectively, the lower and upper scale indices used in the estimation. For an M{d) process, under 
regularity conditions on the short-memory part /* and for appropriately chosen scale function 
and wavelet (f> and ip, it may be shown that as j ^ oo, a]{d, /*) = YavlW^^ol x £7222'^^' and the 
empirical variance 



a. 



2 dcf _i 



(23) 



fc=0 



Abrv and Veitch 



is a consistent sequence of estimator of c|(d, /*) (see Proposition [2]). A popular semi- parametric 
estima tor of the memory parameter d is the local regression wavelet (LRW) estimator of 
(|l998l ). defined as the least squares estimator in the "linear regression model" 

log [a]] = \oga^ + dj{21og(2)} + uj , 



where Uj = \og[ay This regression problem can be solved in closed form: 



u 



d^^W(L,C/,w)'^=^'5]t.,_,log(aJ) , 



(24) 



(in short d^^^) where the vector w =^ [wq, . 

U-L 



, wjj-l] of weights satisfies 



E 



U-L 

and 21og(2) ^ jwj = 1 . (25) 

j=0 j=0 

For U — L = £ > 1, one may choose, for example, w corresponding to the weighted least-squares 
regression vector, defined by 

w =^ DB{B'^DB)-^h (26) 
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where 



dcf 



(21og(2))-i 



B 



dcf 



1 1 
1 



(27) 



is the design matrix and D is an arbitrary positive definite matrix. We will discuss the choice 
of the regression weights w after stating Theorem [Sj which provides the asymptotic variance of 



;5LRW 



d 



(L,C/,w). 



4.4. The local Whittle wavelet (LWW) estimator of d. Let {cj^k, {j, k) G X} be an array 
of centered independent Gaussian random variables with variance Var(cj^yfc) = cr?^., where T is 

a finite set. The negative of its log-likelihood is k)€i {^^"j k/^j k ^ ^^^i^j k)^ *o ^ 

constant additive term. The local Whittle wavelet (LWW) estimator uses such a contrast process 
to estimate the memory parameter do by choosing Cj^k = ^^'^ — ^n{L, U) as defined in 
(j2ip for appropriately chosen lower and upper scale indices L and U, so that the corresponding 
wavelet coefficients Wj*^ are computed from Xi, . . . , X„. The scaling property CTjid, /*) o"^2^'^^ 
and weak dependence conditions of the wavelet coefficients then suggest the following pseudo 
negative log-likelihood 

Lx(a^d) = E iW^f + f log(a222<^)'^) , 



{j,k)ei 



where |X| denotes the cardinal of Z and (X) is the average scale, (X) =^ |X| ^ J2(jk)ejj- Define 

dxid) =^ Argmin^2>oLi((T^,(i) = Yl{j k)ei'^~'^'^^ i^j^k)"^ • '^^'^ pseudo maximum likelihood 
estimator of the memory parameter is then equal to the minimum of the negative profile log- 
likelihood, 



d^'^'^{L,U) = Argmin Lx„(L,t/)(a|(d), d) 

a!G[Ai,A2] 

where [Ai, A2] is an interval of admissible values for d and 



def 



(28) 



dcf 



log I y: 2^'^^^^' 

(j,fc)GX 



(29) 



This estimator has been proposed for analyzing noisy data in lWornell and Qpperiheiml (119921). and 
was th en c onsidered by several author s, mostly in a parametric context, see e.g. iKaplan and Kuo 
(|l993l ) and lMcCov and WaldenI (|l996l ). If X contains at least two different scales then Lj((i) — > 00 
as d — > ±00, and thus d is finite. This contrast is strictly convex, and the minimum is unique: it 
can be found using any one-dimensional convex optimization procedure. In contrast to the the 
local regression wavelet estimator, the definition of LWWE does not relies on particular weights. 
An important issue for both estimators is the choice of the scale indices L and U. The asymptotic 



WAVELET ESTIMATORS OF LONG-MEMORY 



13 



theory developed for these estimators in a semi-parametric context sheds some hght on the role 
played by these quantities, as will be explained in Section [71 



5. Fourier semi-parametric estimators of the memory parameter 



5.1. The periodogram. Given n observations Xi, ■ 
and the periodogram are respectively defined as 



, Xn, the discrete Fourier transform (DFT) 



D^'iX) ""^^ (27rn)-i/2^Xie'*^ , /^(A) ""^^ |I)^(A)| 



t=i 



These quantities are computed at the Fourier frequencies 

dcf 



A,- 



2TTj/n , for k G {1, . . . ,n}, where n = [(n — l)/2j 



(30) 



(31) 



For stationary and invertible M{d) processes (see for example iLahiril (j2003l )). the DFT coeffi- 
cients at Fourier frequencies are known to be approximately asymptotically independent outside 
a shrinking neighborhood of zero. Thus the Fourier transform performs a whitening of the data, 
and, as a consequence, Fourier methods have neat asymptotic statistical properties. 

5.1.1. Differencing an d Tapering. To overcome the presence of polynomial or smooth trends (see 



Hurvich et al 



()2005al )). or to estimate the memory parameter of an M{d) process beyond the 
stationary regime (d > 1/2), some adjustments are necessary. For instance, it has been suggested 
to apply a data taper either to the time-series X or to its 6-th. order difference A'^X. A taper is a 
non-random weight function (with certain desired properties) that is multiplied to the time-series 
(or its difference) prior to Fourier transformation. Tapering was originally used in nonparametric 
spectral analysis of short memory (d = 0) time series in order to reduce bias due to frequency 
domain leakage, where part of the spectrum "leaks" into adjacent frequencies. The leakage is due 
to the discontinuity caused by the finiteness of the sample and is reduced by using tapers which 
smooth this discontinuity. But such a bias reduction inflates the variance as will be seen later in 



a special case in the context of long memory semi-parametric estimation (see section 



Velasco 



1999a 



The idea of applyin g tape r directly to the observations X was proposed by 
Velasco and RobinsonI (|200d ). who cons idered sev e ral ta pering schemes such as the cosine bell 
and the Zurbenko-Kolmogorov tapers ([Zurbenkol Il979l ). These tapers ht, t = l,...,n have 
the property of being orthogonal to polynomials up to a given order, for a subset of Fourier 
frequencies. 



+ t^-^)hte 



it A, 







j G J5,n C {1, . . . ,n} , 



(32) 



t=i 



where A^ and h are defined in (j3ip . A problem with this approach is that the efficiency loss due 
to these tapers may be quite substantial, because the set J's n can be fairly small when 6 is large. 
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In this contribution, we rather focus on the co nstruction suggested in lHurvich and Ravi (|l995l ) 



and later developed in 



Hurvich and Chen 



(|200d ). which consists in differencing before tapering. 
Differencing is a very widely used technique for detrending and inducing stationarity. The 6- 
th order difference will convert the memory parameter of a M{d) process to d — 5, and will 
completely remove a polynomial trend of degree 5 — 1. To apply this technique, an upper bound 
to the memory parameter (or to the degree of the polynomial trend) should be known in advance. 
But if only an upper bound is known, 5 may be chosen too large and consequently the (5-th order 
difference may be non-invertible, that is one may have d — 6 < —1/2. T his situation, refer r ed to as 
over-differencing which may cause difficulties in spectral inference (see lHurvich and Ray ( 1995 )). 
As was suggested by these authors, the use of a data taper can alleviate the detrimental effect 
of over differencing. A main drawback with this approach is that tapering inflates the variance of 
th e estimator. To mi n imize this effect, the tapers should be chosen carefully. 



Hurvich and ChenI (|200d ) have defined a family of data taper depending on a single parameter 
r, referred to as the taper order. Set ht = 1 — e^''^*/" and, for any integer r > 0, define the tapered 
DFT of order r of the sequence x = {x^, G Z} as follows 



L»^(A) =^ (2^na,)- 



t=i 



dcf 



I^{X)''^'\D^{X)\' 



(33) 



where the su bscript r denotes the taper order and Or X^"=i l^iP^ is a normalization factor. 

As shown in ([Hurvich and Chenl . l200d . Lemma 0), the decay of the discrete Fourier transform of 
the taper of order r is given by 



(27rna^)~^/2^/i[e- 



t=l 



< C 



n 



{i + n\\\y 



A G (-7r,7r) 



This property means that higher-order tapers control the leakage more effectively. 

Note that the Fourier transform of the taper may be expressed as a finite sum of shifted 
Dirichlet kernels. 



t=i 



fc=o U=i 



(34) 



Since X^"^]^ e' = 0, this relation implies that for j G {1, . . . , n — r}, Yl^=i hje^^^ = so that the 
tapered Fourier transform (evaluated at Fourier frequencies) is invariant to shift in the mean. This 
shift-invariance is achieved without restricting attent ion to a coarse grid of Fourier frequencies, 
as is necessary for the Zhurbenko-Kolmogorov taper (|Velascd (|l999al )). 

However, the construction of theoretically-justified memory estimators using the tapered peri- 
odogram may require dropping some Fourier frequencies as will be seen in the definition of the 
pooled periodogram in (j35p . This is related to the following observation. For r = (no taper), the 
DFT coefficients D^{Xj) of a white noise {Zt} at Fourier frequencies A^, Xj, k ^ j G {1, . . . ,n} 
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are uncorrelated. This property is lost by tapering. For r > 1, the correlation K[D^ {Xj)D^ (Xk)] 
is equal to if \k - j\ > r, and (lirar)'^ {-!)'' (^^J if \k\ < r. 

5.1.2. Pooling. Let /^(A) be the tapered periodogram introduced in ([33]) . When considering non- 
linear transformations of the periodogram such as taking logarithm, variance reduction can be 



ing groups of finitely 



Hannan and Nichollsl ()1977l ) and 



many, 



say 



p, consecutive which results in a pooled 



Robinson 



(jl995bl )). To understand why pooling 



obtained by poo 
periodogram (see 

may be helpful, recall that if Z is a white Gaussian noise, then the variance of the running mean 
of the log-periodogram p~^^^ Sfc=i log/^(Aj4.fc) is whereas the variance of the logarithm of 
the running mean p"^/^ lo g (T^^-]^ I^(Xj^k)) i s pijj' ip), where ip{z) = r'(2:)/r(z) is the digamma 
function (see for instance iJohnson and Kota (jlDTOl )). The quantity pil^'{p) decreases from vr^/G 
to 1 as p goes from 1 to oo. For p = 3, pip'{p) is 1.1848 and its value changes slowly thereafter 
as pil^'ip) = 1 + l/{2p) + 0(p~^). Nonetheless, this shows that the variance of the local average 
of the log-periodogram is larger than the variance of the logarithm of local average and explains 
why typical values of p are p = 3,4. 

As seen above for the tapered DFT coefficients of a white noise in a non-asymptotic context, in 
order to guarantee asymptotic independence of the tapered periodogram ordinates, if p successive 
values of the periodogram are pooled, then, at the end of the block, r DFT coefficients are 
dropped, where r is the taper order. More precisely, set K{p,t) = [{n — l)/2{p + r)] and for 
k £ {!,■ ■ ■ , K{p, r)}, define the pooled periodogram as follows 



(p+T)(fe-l)+P 

= E 

j = {p+T){k-l) + l 



(35) 



where =^ p ^ J2f=(l+T)(k-i)+i -^i = (^(^^ + t)(/c - 1) + p + r + 1) n/n. The definition of the 
central frequency A^ seems somehow arbitrary. Our choice is motivated by the fact that the 
Chen and Hurvich's taper actually mixes together r adjacent periodogram ordinates, so that 
(p + ''") Fourier frequencies are mixed in each pooled and tapered periodogram ordinate. Note 
that the bias of the GPH estimator defined below is very sensitive to the definition of this central 
frequency. 

5.2. The Geweke— Porter— Hudak (GPH) estimator of d. Assume that the differencing order 
S induces stationarity, i.e. d < 6 + 1/2, and the taper order r is larger than 5. Then, for certain 
sequences {in} and {rUn} which increase slowly with n, and under smoothness conditions on the 
short memory component /*, the ratios of the pooled periodogram of the (5-th order difference 
of Y =^ A^X divided by its spectral density 7^^(Afc)//y(Afc), in < k < rrin, can be regarded as 
approximately independ e nt and iden tically distrib uted (i.i.d.) in a sense that can be rigorously 
characterized; 



^{p+r)fe 



Robinson 



(|l995bl ) and 



Lahiri 



(200 
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Based on this heuristics, a popula r semiparametric estimate of d i s the loK-p e riodog ram estimate 



of 



ap 
3( 



Geweke and Porter-HudakI (|l983l ). defined here [in the manner of lRobinsonI (|l995bl )] as the least 
squares estimate in the linear regression model 



log I^,{Xk) = log /*(0) + id- 5)gCXk) + Uk, l<k<m 



where g{X) 



dcf 



2 log 1 1 — e'''* I and is " approximately" equal to log 



regression equation can be solved in closed form : 



i^,A^k)/fx{Xk: 



(m) = 



log[7X,(Afc)] + 5. 



(36) 
This 

(37) 



To simplify the notations, we have made the dependence in the differencing, tapering and pooling 
orders implicit. The cho ice of these orders 5, r and p will b e discussed in Section [HI This estimator 
has been introduced by lGeweke and Porter-Hudald (119831 ) and was later used in many empirical 
works. 



Remark 2. In the definition of the GPH estimator in iRobinsonI (|l995bl ). the first in DFT coeffi- 
cients are eliminated, in is referr ed to as the trimming number. Trimming is sometimes required 
to eliminate de terministic trend (jHurvich et al.l . l2005al ). or to deal with non-Gaussian processes 



( Velasco 



2000). 



5.3. The local Whittle Fourier (LWF) estimator of d. Since, as mentioned above, /^(Afc)//y(Afc) 
can be regarded as approximately i.i.d. and /y(A) ~ C|l — e'^|~^'-'^~^^ in the neighborhood of zero, 
using the same arguments as in Section we may approximate the negated likelihood as follows: 



Lr^m{C,d) = ni 



III f 

log(C|l-e'^'=| 

k=i ^ 



-2{d-S) 



) + 



(38) 



C\l- QiXk\-2{d~S) 

Note that pooling is here irrelevan t becaus e non non-linear transforr nation is involved . This 
estimator was originally proposed bv lKiinschl (j 198 71 ) and later studied in 



Robinson (1995a). After 



eliminating C by maximizing the contrast (j38p . we get the profile likelihood, defined as 



LZ''{d)= log [m-'Y.Ir{Xk)\l-e 



iAfe|2{d-5) 



2(d- (5)m-^^log(|l -e^^*"! 



\ k=l / k=l 

and we define the Local Whittle Fourier estimator (LWF) as the minimum 

iP^^{m) = Argmm Lr,m{d) + 6 . 



(39) 



(40) 



The function d Lr^m{d) is convex and thus admit a single global minimum, which can be 
obtained numeri cally by using a standard one-dimensional convex optimization algorithm. In 
RobinsonI (jl995al ) , the minimization in ()40p is performed over a closed interval which was supposed 
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to include the true value of the parameter. But, because the contrast is strictly convex, there is 
in fact no need to impose such a restriction. 

5.4. The exact and the non-stationary extended local Whittle estimators. To conclude 
this section, let us mention two recent works on the estimation of the memory parameter for 
non-stationary M{d) processes, d > 1/2, which are not covered in details in this contribution 
because they are derive under slightly different conditions and henceforth do not compare well 
with wavelet estimators. 



Shimotsu and PhillipsI (|2005l ) introduced an exact local Whittle estimator. It is applicable 



when the M{d) series is generated by a linear process and when the domain of d is not wider than 
9/2. Their estimator is based on fractional differencing of the data and the complexity of their 
algorithm is of the order n?, where n is the number of observations. In contrast, the complexity 



Moulinesetal 



of the estimators we consider is of the order of n log9(n); see the discu s sion i n 
(|2007d ). Note also that the model considered bv IShimotsu and PhillipsI (|2005l ) is not an M{d) 
process in the sense given above and is not time-shift invariant, see their Eq. (1). In addition, 
their estimator is not invariant upon addition of a constant in the data, a drawback which is not 
ea sily dealt with, see their Remark 2. 



Abadir et al.l (|2007l ) propose to extend the local Whittle estimator to d G (—3/2, oo) calling it 



the fully extended local Whittle estimator. This estimator is based on an extended definition of 
the DFT D^{Xj,d), which include correction terms, i.e. D^{Xj,d) (Xj) + k^{Xj,d). The 

correction term k^{Xj,d), which takes constant values on the intervals [p — l/2,p + 1/2), 
p = 0,l,... is defined as k^{Xj,d) = if d € (-1/2, 1/2) and 



fe-'^(A,-,d)=e-^^^(l- 

r=l 

(27rn)-i/2(A--iX, 



-iA, 



d£ [p-l/2,p + l/2) 



p 



1,2, 



(41) 



where Zn,r = (27rn)~ ' (A''~ X„ — A.^~ Xq), r = 1, . . . ,p. The corresponding corrected peri- 
odogram I^{Xj, d) is given by I^{Xj, d) = \D^{Xj, d)\'^ . The extended local Whittle estimator is 
then defined as dP^^ = Argminj^j^^^^.^^ d^ia.^] ^rn^^(^) where L^^{d) is 



LWF 



{d) '= log ( m 



i5]F(A,,d)|l 

fe=i 



i At 1 2d 



■m 

2dm~^^\og{\l 

k=l 



Note that to compute k^{Xj] d), we have to involve additional ob s ervat ions X_p_|_i, . . . , Xn, where 
p *== V [dmax — 1/2J, Compared to the Shimotsu and Phillips ( 2005 ) estimator, this estimator 
is easy to evaluate numerically, but the approximation of the extended local Whittle function is 
not continuous at d = 1/2, 3/2, . . . , which does not allow one to obtain limit theorems at these 
points (and, in the finite sample case, causes disturbances in the neighborhood of these values). 
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In addition, this estimator is not robust to the presence of polynomial trends in the data (if d is 
the memory parameter, the method tolerate a polynomial trend of degree at most [d + 1/2J). 

6. The semi-parametric estimation setting 

The the ory of seni i -parani etric Fourier est i mators was developed in two fundamental papers by 
Robinson, Robinson (1995b) and iRobinson (1995a), which establish, under suitable conditions. 



the asymptotic normality of the GPH and the LWE estimators in the stationary case. These 
results were later extended to non-stationary M(d) processes for different versions of the memory 
estimator and under various sets of assump tions. The theory of sem i-par ametric wavelet estima- 

i2007b ) and 



tors was developed much more recently in 
(some preliminary results are in 



Moulines et a. 



Bardet et al 



Moulines et al. 



(I2007d 1 



(12000|) and iBardetl ^2002 )). To allow for compar- 
ison the wavelet and the Fourier approaches, the asymptotic properties of the estimators are 
presented under a common set of assumptions. Because the theory of wavelet estimators is much 
less developed than the theory of Fourier estimators, these assumptions can often be relaxed in 
the context of Fourier estimators. 

There are two types of additional assumptions that enter into play in an asymptotic theory. 
First, the semi-parametric rates of convergence depends on the smoothness of the short-memory 
component in a n eighborhood of zero frequency. The most common assumption, introduced in 
([Robinsonl . Il995bl ). is a Holder condition on the short-memory component of the spectral density 



Definition 2. For any 0</3<2, 7>0 and e G (0,7r], 7i{(3,^,e) is the set of all non-negative 
and even function g that satisfies g{0) > and for all A G (— e, e) 

|<7(A) - 5(0)1 < 75(0) |A|^ (42) 

The larger the value of (3, the smoother the function at the origin. Observe that if /* is 
even - as assumed - and if it is in addition infinitely differentiable, then /*'(0) = and hence, 
by a Taylor expansion, (H2T) holds with (3 = 2, that is, in this case, one has /* G 7i{2,^,e). 
Andrews and GuggenbergeiJ ( 2003 ) extend this definition to the case /3 > 2 by considering even 
functions satisfying 



9(A) - 5(0) 



L/3/2J 
k=l 



<jgmxf-'W2\ 



(43) 



with |(/9/j| < 75(0), for any k G {1, . . . , [/3/2j}. To take advantage of this more refined smooth- 
ness assumption when > 2, b i as reduction techniques mu s t be applied (see for ex ample 



Andrews and Guggenberger (|20o3); iRobinson and Henryl ()2003l ): [Andrews and SunI (120041 ) ). We 



will only consider (3 <2 for sake of brevity. 



WAVELET ESTIMATORS OF LONG-MEMORY 



19 



Second, the definition of an M{d) process accounts only for the spectral (or equivalently co- 
variance) structure, which specifies the distribution of the process if X is Gaussian. To extend 
the results in the non-Gaussian context, it is necessary to specify the distribution of the process 
beyond its second-order properties. The most complete asymptotic theory has been developed so 
far for linear processes. 

Definition 3. We say that X is a strong linear M(d) process if there exist a {Z)-sequence 
{as}s&i. and an i.i.d. sequence {Zs}s£z satisfying E[Zo] = 0, ]E[Zq] < oo, and for any t G Z, 



t-s 



K= [d+l/2\ 



(44) 



Remark 3. According to the standard terminology, strong here refers to the fact that {Zt} is 
i.i.d. (or a strong white noise). This assumption can often be relaxed by supposing that {Zt\ is a 
martingale difference sequence ( E [Z^ | Tt-i] = where Tt = ct{Zs, s <t) is the natural filtration 
of the pr o cess) s atisfying various additional conditional moment assumptions. For example, 



RobinsonI (jl995al ). and many authors after that, assume that {Z^ — ^\ZfW is a square integrable 



martingale difference. 



The following theorem, established in iGiraitis et al.l (jl997l ). provides a lower bound for the 
estimation error. 



Theorem 1. Let dmin < c^max in 
c > such that, 



lim inf inf 



n "'mill 



sup 

<d<dn 



., £ £ (0, vr], /? G (0,2] and 7 > 0. There exists a constant 
sup ¥d f* fn^/(^^+^)|d„ - (i| > c) > 0, (45) 



where the infimum inf^ is taken over all possible estimators based on {Xi,--- ,X„} and IPdj* 
denotes the distribution of a Gaussian M(d) process {Xt}tez with generalized spectral density of 
the form ([3]) . 

We shall see in the sequel that the best possible rate 71^/(2/^+1) -g achieved by both wavelet and 
Fourier estimators when X is a strong linear M{d) process. 

The theory of the semi-parametric estimation of d for several non-linear M(d) processes, used 
in particular in financial econometric, and in tele t rafic rn odeli ng, have also been investi gated 



in the literature (see the recent surveys iDeo et all (j2006al ) and iTevssiere and Abrvl (|2007l ) and 
the references therein). The stocha stic volatility n iodel ( a special in s tance o f the signal plus 



Hurvich et al 



(|2005bl l: 



Deo et al 



(12006b|), which estabhsh 



noise model) has been considered in 
consistenc y, rate of converge nce and asymptotic normality of an appropriately modified LWF 
estimator. iDalla et al.l (|2006l ) provide general conditions under which the LWF estimator of the 



memory parameter of a stationary process is consistent and examines its rate of convergence. 
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This class of processes include, among others, signal plus noise processes, nonlinear transforms 
of a Gaussian process, and exponentia l gener alized autoregressive, conditionally heteroscedastic 



(EGARCH) models, etc... iFav et al.l (|2007l ) provide the consistency and rate of convergence 



of the LWW estimator for the infinite source Poisson process. The results are in general not 
as complete as in the linear case, a nd the requ i red a ssumptions are specific to each con sidered 
model (abstract assumptions like in (jPalla et al.l . l2006l . Eq. (8)) or (jMoulines et al.l . l2007d ) can be 
used, but checking these still require model-dependent conditions). In addition, the asymptotic 
behavior of the estimators are model-dependent and might depart significantly from the results 
obtained in the linear case. 



7. Asymptotic properties of the wavelet estimators LRW and LWW 

7.1. The between-scale process. Before stating the main known results on the asymptotic 
behavior of LWW and LRW estimators, some additional definitions related to the spectral density 
of wavelet coefficients are required. 

If the process X is an M.{d) process, as defined in Section [2l and M > d — 1/2, then A*^X 
is weakly stationary. It follows that the process {W^iJ\kez of wavelet coefficients at scale j > 
is weakly stationary in k. However the two-dimensional process {[W^^, of wavelet 

coefficients at two different scales j and j', with j ^ j' , is not weakly stationary. This is why we 
consider the between-scale process {[VKj'^, W^j^(j — j')'^]'^}k£Z ; where 



^ -n,2"fc' ^j-«,2"fc+l) • • • 5 ^j-n,2"fc+2"-lJ • 



(46) 



The index u in (|46|) denotes the scale difference j — j' > between the finest scale j' and the 
coarsest scale j. If u = 0, that is j = j', then W^^(O) is the scalar Wj^/^. For u > 0, the second 
component of the between-scale process is a vector whose entries are the wavelet coefficients with 
scale index j — u = j' and translation indices 2^~^' k, 2^~^'k + 1, . . . , 2^~^' {k + 1) — 1. Using (fT3|) . 
it follows that 



Wfju) = [{hj^u,-*^)2ik+l 2i--, / = 0, . . . , 2" - 1]^ 



,2" - 1 



where B is the shift operator defined, for any sequence {ck}kez, by (Bc)fc = Ck-i- The between- 
scale process is stationary in k because all its entries can be expressed with the same downsampling 
operator applied to jointly stationary time series, namely, /i,-, * x and B-^^'-^[hj 

„u,- * xj — 

[B~' ^hj-u,] * X, Z = 0, . . . , 2" — 1. One can therefore write, for all < u < j, 
Cov;(Mf„W^^,,(7.)) = r ei^('^-'=')D,-„(A;/)dA , 

J —n 
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where Y)j^u{X; f) is the cross-spectral density function of the between-scale process. The case 
u = corresponds to the spectral density of the within-scale process {VFj'^jfcg^. When X is an 
M{d) process with spectral density function we often denote Dj^„(A; /) by Dj^„(A; d, /*). 

7.2. Generalized fractional Brownian motion. We shall approximate the within- and between- 
scale spectral densities Dj^„(A; d, /*) of the process X with memory parameter d G M by the 
corresponding spectral densities of the generalized fractional Brownian motion B(^dy This process 
is parametrized by a family of smooth test functions 0(t), t £ M and is defined as follows: 
9 £ &{d)} is a mean zero Gaussian process with covariance 



Gov(i?(,)(0l), i?(,)(02)) 



|A| 



-2d , 



L(A)02(A)dA. 



(47) 



The finiteness of the integral /j^ | Aj"^*^ |0(A)p dA provides a constraint on the family @(d)- For 
instance, when d > 1/2, this condition requires that ^(A) decays sufficiently quickly at the origin 
and, when d < 0, it requires that ^(A) decreases sufficiently rapidly at infinity. Hence, under 



(W-1) (W-4) , 9 can be a wavelet ^/^ if d G (1/2 — a,M + 1/2). The discrete wavelet transform of 



B(^d) is defined as 



(48) 



Cov«tw(.J,(n; 



As shown in (jMoulines et al.l . l2007bl . Relation (35)), for all j, k and k' in Z, and n > 0, one has 

2^''^ r Doo,u(A;d)e'^('=-'=')dA, (49) 

J —IT 

where Doo,u(A; d) does not involve j and is given by 

" ^ (50) 



Doo,«(A; d) = ^\\ + 2/7r|~2<i e„(A + 2l7r) ^{X + 2lTr)ip{2~''{\ + 21-k)) 



where, for all u>0, e„(0 = 2-«/2 [1, e-'^ . . . , e-'(2"-i)2 ^s mentioned above, 

W^^jJ is well defined under | ( W- 1 )|( W-4 ) | when d G (1/2 - a, M + 1/2). Under the same condition, 
using the decay condition of -0 in (W-2) and d?]), one easily gets that for any n > 0, Doo,M(A;d) 
is continuous (— vr,7r) \ | 0) and |Dno^„(A ; (j)| = OdAp'^^'^"'^)) as A ^ 0. If moreover d < M, one 
has (see Relation (72) in 



Moulines et al 



sup 2 

n>0 



u(2d-l/2) 



(l2007d)) 
/ |Doo,«(A;(f)| dA < oo . 

J —n 



(51) 



The variance Cj (d, /*) = Var[Wj^'^] can be interpreted as a scale spectrum, in words, the power 
at scale j. It is approximated by /*(0)K(d) 2^-^'^ where the constant K{d) is given by 



K{d) 



dcf 



(52) 
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7.3. Uniform bounds. Using (jMoulines et al.l . l2007bl . Theorem 1) and (jMoulines et al 
Theorem 1), the following result holds. 



Proposition 2. Let X he an M(d) process with d G M and f* £ 7i{P,'y,e) for some /?,7 > and 

(53) 



e G (0,7r]. Assume (W-1) (W-4) with (1 + /3)/2 -a<d< M + 1/2, 



a]{dj*)- f*{0)K{d)2^^'^ < C/*(0)L2(2'^-^)^' 
If moreover e = ir and d < M , then, for all A G (— tt, vr), j > u > 0, 

D,va(A;d,r) - r(0)Doo,„(A;d)22i^| < C f*{0) Ll^^''-"')^ 
where \y\ denotes the Euclidean norm of any vector y. 



(54) 



In (|53p and (j54p . the constant C only depends on d, (3 and on the wavelets -0 and (f). It can be 
made independent of d on any compact set included in ((1 + 0)/2 — a,M + 1/2) for (j53p and in 
((1 + (3)/2 — a, M] for (15^ . This proposition shows that the covariance properties of the wavelet 
coefficients of an M((i) process resemble those of the generalized FBM at large scales. The 
l atter are not, in general, de c orrela ted as sometimes heuristically assumed (see lAbrv and Veitch 
(j 19981 ) and lVeitch and Abrvl (jl999l ) for example). Exact decor relation occurs but in very specific 
cases: if {V'j.fc} is an orthonormal basis of L^(R) and d = 0, see ( Moulines et al.l . l2007bl . Remark 7). 
Due to its very specific spectral property, the ideal Shannon wavelet coefficients (defined in (f20ll ) 
of Bd satisfy partial independence for d / 0. Indeed, applying (f20]) in (fSOl) . we get, for all 
A G (-vr,7r). 



Doo,«(A;(i) 



|A|)- 



-2d 



for u> 1 
otherwise. 



(55) 



implying that w'-'^^ and W^f\, are uncorrelated for j ^ j' and that wj^^ and wj^^, are uncorrelated 
only if d = 0. 

The asymptotic behavior of wavelet estimators c?'^^(L„, ?7„, w„) and d^^^(L„,C/„) defined 
in ()24p and ()28p will be derived for specific lower and upper scale sequences (L„) and {Un)- In 
the semiparametric framework, the lower scale sequence (Ln) governs the rate of convergence of 
the memory estimator. There are two possible settings as far as the upper scale sequence [Un] is 
concerned: 



(S-1) Un — Ln is fixed, equal to ^ > 
(S-2) Un < Jn for all n and Un — Ln - 
defined in (122)1. 



oo as n — > oo, where Jn is the largest available scale 
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7.4. Asymptotic properties of the LRW estimator. We will use the following definition, for 
all i,j > 0, 



^idW = V77^ / |Doo,|i-i|(A;d)| d\, 



(56) 



with P fyi.nfA; d) G M^" defined by ([50|) . The following result, adapted from 



Moulines et al. 



(|2007al ). ap plies to Gaussian M .( d) pr ocesses; it has recently been extended to strong linear M{d) 



processes m 



Roueff and Taqqu 



(120071) 



Theorem 3. Let X be a Gaussian M(d) process with generalized spectral density given by ([3|) 



with d eR and f* G H{i3,'j,e) for some 7 > 0, /? G (0,2] and e G (0,7r]. Assume \(W- W-4) 
with 



{1 + f3)/2-a<d<M . 

Let (Ln) be a sequence satisfying 

lim ln2-(^+2^)^" + (n2-^")-i| = . 

n— >oo L J 

and w be a weight of length £ + 1 satisfying (|25|) . Then, as n ^ 00, 

Vn2-^" (^^^^(L„, Ln + ^, w) - ^mWY. ^i^^Ad)^J 

\ i,j=0 



(57) 



(58) 



(59) 



Remark 4. This result is stated in 



obtained by using (jMoulines et al 



Moulines et al 
2OO7J, Corollary 2). 



(j2007al ) with e = vr. The case e < it can be 



Remark 5. The larger the value of (3, the smaller the size of the allowed range for do in (1571) for a 
given decay exponent a and number M of vanishing moments. Indeed the range in (j57p has been 
chosen so as to obtain a bound on the bias which corresponds to the best possible rate under the 
condition /* G Ti.{l3,^,e). If ([57]) is replaced by the weakest condition do G (1/2 — a,M], which 
does not depend on the same CLT ([57l) holds but /? in Condition (j58]) must be replaced by 
p' G (0,/3]. This P' must satisfy 1/2 - a < (1 + (3')/2 - a < do, that is < /?' < 2(4 + a) - 1. 
When (3' < [5 one gets a slower achievable rate in (I59p . 

Remark 6. The condition n2~^^"*"^^^^" — > guarantees that the bias is negligible in the limit. The 
optimal rate n^/^^^'^l^'^ given by Theorem [T] is obtained with n2~^^^'^^'^^" x 1, in which case the 
squared bias and the variance are of the same order of magnitude, see Theorem 3 in 



Moulines et al. 



(j2007bl ) where uniform bounds of the mean square error and an exact equivalent of the variance 
are given. The asymptotic equivalent of the variance is (n2~^")~-^ Si j=o ^ ^^'^ 

expected from ([59]) . 
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Remark 7. Theorem [3] applies only to the setting (S-1) since Un = L^+L Under the setting (S-2 
[/„ — L„ — > cx) as n — > oo, o ne has to replac e the w eights w by a sequence of weights (w„) of 
lengths [/„ — L„ + 1. Using ([Moulines et alj . l2007bl . Proposition 4), if e = vr, it is possible to 
extend the Theorem [3] to this setting, provided that Wn i — > Woo i as n — > oo for all i and 



lim sup |t(;.„,j|2*/^ = , 

»oo ^ — ^ - ' 



(60) 



i>l 



in which case one has 



V 



I 0, J] «ioo,*V,j((i) 

i,j=0 



w, 



The variance in the right-hand side of the last display is finite as a consequence of ()5ip and (j60p . 

The standard theory of linear regression shows that, for any fixed i > 1, the optimal design 
matrix is D = V'^^dJ), where V(d,£) = [Vjj(d)]Q<. <^ is a + 1) x + 1) matrix. By (f26D . 
the corresponding weights read 



w°P*(d,^) = V"\dJ)B{B^Y-\dJ)B)-^h , 



where B and b are defined by ()27l) and the associated limiting variance is 

pI (dj) = w°P'{djfv{d,£)w°P\d,£) = h^{B^Y-\d,£)By^h 



(61) 



(62) 



Since the regression vector weights w of length I can be viewed as regression vector weights of 
length i + 1 with zero as last coordinate, we have that p'^p^{d,£) decreases as £ increases and we 
will denote its limit by p'^p^{d, oo). Figure [3] shows that the limit is approximately attained for 
£ >7 for a standard choice of wavelet. 

The optimal regression vector w^p' (d, £) cannot be used directly since it dep ends on unkno wn 
the memory parameter d, but a plug-in method can be used as suggested by iBardetl (j2002l ) in 
a similar context: a preliminary consistent estimator, say is used to estimate the optimal 
weights w = w°P*((?'^), £) and then the estima tor d^^^ {L, U, w ) is ap plied. 

A different choice of weights is suggested by lAbry and VeitchI (119981 ) (in a parametric context). 
This choice relies on the approximation that Doo,ii(A; d)/K(d) is nearly zero for u > and nearly 
constant equal to (27r)^^ for n = 0. This provides a diagonal approximation of \^~^{d,£) yielding 
a diagonal design matrix D with diagonal entries Di^i = 2~*, ? = 0, . . . , ^, up to a multiplicative 
constant. By straightforward computations, this design matrix define the following Abry-Veitch 
weights. 

dcf (i - ??£)2"* 



w 



AV 



(^) 



21og(2)K£(2 - 2^ 



0,... 



(63) 
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where 



m 



def 



E 



2-J 



J-. 



and Ki =^ y^(i - rjeY 



2-^ 



2-J 



2-^ ■ 



(64) 



j=0 j=0 

This choice, while not optimal, is closed to it in practice, at least for not too large values of d 
and with a standard choice of wavelets, see Figure [2j One advantage of this choice of regression 
vector weights stems from the fact that it does not require the use of a pilot estimator, since the 
weights do not depend on the unk nown parameter d. 



Let us denote, for all n > (see 



Moulines et al. 



lu{d) 



dcf 



|Doo,n(A;d)|' dA = (27r)-i J^Cov^ (vF^J, w['l^ , 



(65) 



wi th B^ JX\d) £ M^" def ined hy §0^. In the case where the weights w are chosen as proposed 
bv lAbrv and VeitchI (jl998l ) given by (j63p . the asymptotic variance in the right-hand side of (j59p 
reads 

i,j=0 

where ^ -|- 1 is the number of scales used in the regression. Inserting (j63|) and (|65p . we get, for 
any i > 1, 



p\d,i) 



TT 



(2-2-0K,(log(2)K(d))2 



j{i - 7]i){i + u-r]e)} , (66) 



^ u=l i=0 ) 

where K(d) is defined in (j52p . and r/^ and in (j64p . When ^ is large, the last display can be 
approximated by its limit as £ — > oo, namely. 



P^{d, oo) 



dof 



vr 



[21og(2)K(d)]^ 



Io(d) + 2^I„(d)2(2'^-i)" 



(67) 



u=l 



7.5. Asymptotic properties of the LWW estimator. Let us now consider the LWW esti- 
mator dP^ ^ (Ln,Un) defin e d in (l28l) . The following results was first proved fo r a Gaussian M(d) 



process m 
(I2OO7I). 



Moulines et al. 



(120073) and then extended to linear processes in iRoueff and TaqquI 



Theorem 4. Let X be a strong linear M(d) process with generalized spectral density given by 



with deR and f* e n{p,j,e) for some 7 > 0, /? G (0,2] and e G (0,7r]. Assume \(WA%(W-4) 
with Condition |57[). Let (Ln) be a sequence satisfying 



hm |n2-(i+2/^)^" +L2(n2-^")-Vn = 



(68) 
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and iUn) be a sequence such that either (S-1) or (S-2) holds. Then, as n ^ oo, 



{n2-'^")'/^d^'^'^{K,Un)-d)^Ar[0,p\d,e)] , (69) 

where i = lim„_>oo(C^n — Ln) G {1, 2, . . . , oo} and where p^{d, I) is given by (\66h for I < oo and 
§7^ forl = CO. 



Remark 8. As in Theorem[3l the condition n2~'^^~''^^)^" guarantees that the bias is neghgible 
in the Umit by imposing a sufficiently fast growth for L„. The condition L^(n2~^")~^/^ — > 
means that n2~^" has to grow faster than which is at most of order log'^(n) and hence always 
holds in the typical regime where n2~^" x n'^ with 7 G (0, 1). Relation (j69p is an asymptotic 
normality result. If we are interested merely in the rate of convergence of the L^y W estimator 



d^ww then we can relax condition (1681). It follows from (IMoulines et al 



2007c. Theorems 1 



and 3) that if under the assumptions of Theorem [H ([68]) is replaced by 



lim 

n— >oo 

then 



L-i + L2(n2-^")-V4}=o, 



iP^'^iK, Un) = d + Op ((n2-^")~^/^ + 2~^^- 
Hence, for 2^" x 7^1/(1 +2/?)^ -^g the optimal rate 71^/(^+2^), stated in Theorem [TJ As shown in 



Moulines et al. 



(|2007d ). this result holds for a class of "weak" linear M((i) processes (see remark 



El). 

The following observations, which follow directly from Theorem U] seems to have be unkwnown 
so far: 

Corollary 5. The LWW estimator has the same asymptotic variance as the LRW estimator with 
Abry-Veitch weights 



Corollary 6. Among all wavelet estimators of the memory parameter d presented in this paper, 
for a given choice of wavelet and scales involved in the estimates, the estimator with optimal 
asymptotic variance is the LRW estimator using the optimal weights defined in (|6ip . 



As explained above, the optimal LRW in Corollary [6] estimator requires plugging a preliminary 
consistent estimator of d. 

7.6. Asymptotic variances. The asymptotic variances of both the LRW and the LWW esti- 
mators depend on true value of the memory parameter d and on the wavelet ip, as they are all 
expressed in terms of Doo,u(A; d), defined in (j50p . In practice, one estimates the limiting variance 
p'^{d, i) by p'^{d, €) in order to construct asymptotic confidence intervals. The continuity of p^{-,i) 
and the consistency of d justify this procedure. 
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A co mparison of the asymptotic variances p {d, i) for sev eral wavelets caii be foun d in 



Moulines et al. 



i2007cl ) 



Moulines et al 



indicates, the 



(|2007d ). (see also Figured]) In particular, as Figure 1 in 
choice of wavelets does not matter much (provided that (1 + /S) /2 — a < d < M holds) and a sen- 
sible approximation can be obtained by using the Shannon wavelet, for which a simple expression 
of the asymptotic variance can be obtained thanks to ()55p . Using the Shannon wavelet in (jSOp . 
we get, for all A G (— 7r,7r), Doo,«(A;(i) = for u > 1 and T)oo,o{^',d) = (27r — lAj)"^*^ so that, for 
all (i G M, (j66p becomes 



where g{x) 



A^dA . 



2(2-2-0^£log'(2)<72(-2d) 
8. Asymptotic properties of the Fourier estimators GPH and LWF 



(70) 



8.1. Asymptotic properties of the GP H estimator. Th e consistency and asymptotic nor- 
mality of the GPH have been established bv iRobinsonI (|l995bl ) for stationary invertible Gaussian 



M (d) process —1/ 2 < d < 1/2 with no data taper (r = 0) and no differencing {5 = 0). As shown 
bv IVelascd (jl999bl ). the GPH estimator with (r = and = 0) exhibits non-standard behavior 
when d > 1/2. Although it is consistent for d G (1/2, 1] and asymptotically normally distributed 
for d G (1/2,3/4), the GPH estimator has a non-normal limit distribution for d G [3/4,1], and 
for d > 1, it converges to 1 in probability and is inconsistent. Hence, the interest in applying the 
GPH estimator under differencing 6 > and tapering r > 0. 



The following result is adapted from iMoulines and Soulieij (|2003l ). To state the results, some 
additional notations are required. If {Zt} is a Gaussian white noise, Ip.^{Xk) is distributed as 



|Gp,T-|P/2 where Gp^r = [Gp^^ 



(1) 



) >J"p,r J 



covariance matrix Sp^,-, whose expression is given in 



is a 2io-di mensional zero-mean Gaussian vector with 

(I2OO2I 1. Define 



Hurvich et al 



a. 



7p,. = E[log(||VFp,,||V2)] 
Numerical expressions for these quantities are given in 



Var[log(||iyp,,||V2)] . 

i2002l l. 



(71) 



Hurvich et al 



Theorem 7. Assume that X is a Gaussian M{d) process and f* G 7i*{(3, 7, e) for some (3 G (0, 2], 
7 > 0, e G (0, vr] . Let 5 > be the differencing order, t > be the tapering order, and p > 1 be 
the pooling order. Let rUn be a non- decreasing sequence of integers such that 

lim (m-^ + mff+^n-'^'^) = 0. (72) 

n— >cx> 

Then, for any d satisfying 

5-T-l/2<d<5 + l/2 , (73) 
the GPH estimator defined in (j37p satisfies, 

V^(dG™(m„) -d)^ Af{0, alj4) . (74) 



where ap.^ is defined in ([71]) . 
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Compared to the wavelet estimators, the size of the confidence intervals does not depend on 
d, which may be seen as a significant advantage. On the other hand, there is an inflation of the 
variance over all the interval {6 — t — 1/2,5 + 1 /2) and it can be greater than the limiting variance 
obtained using by the wavelet estimator, at least for certain values of the memory parameter. 

The definition of the pooled periodogram ()35p implies that the number of Fourier frequencies 
used to evaluate dP^^{mn) is equal to {p + T)mn- Hence the efficiency ratio between two GPH 
estimators using same number of Fourier frequencies but two di fferent pooling numb ers, say p 



2002, Theorem 



and p', may be expressed as [p + T^a^^^j [p' + t)(t^, ^. As shown in (iHurvich et al. 
1), the function p i— > (p+r)(Tp ^ is decreasing showing that pooling increases asymptotic efficiency. 
In addition, for any fixed r, limp_»oo(p + T)a'^ .^ = '&(r), where 



$(t) 



r(4r + l)r4(r + l) 
r4(2r + 1) 



(75) 



As seen below, this efficiency bound is achieved by the local Whittle estimator. Therefore, 
the order of pooling can be made arbitrarily large, and at least asympto tically, an increase in 



Hannan and Nicholls 



the po oling order will result in a decrease of the asymptotic variance; see 
(|l977l ) and Theorem [71 In practice, of course, this is not possible and since the improvements in 
asymptotic efficiency happen quickly, there is no need to use a very large p; p = 3, 4, 5 are typical 
values. 



Remark 9. As observed in ()73p . the differencing order 5 and the taper order r control the range of 
values of the memory parameter d which can be inferred. The number of differentiation 5 controls 
the upper bound for d, while the taper order r controls the range. These two parameters are 
independent, by choosing the number of differentiations 5 and the taper order r we can therefore 
cover any intervals of admissible values for d (the same comment apply to the LWF estimator). 
If r = 0, the interval over which the GPH estimator is consistent and asymptotically normal 
is (5-1/2,(5 + 1/2). If r > 0, the range is [5 - r - 1/2, (5 + 1/2], as indicated in The orem El 



Velasco 



Note th at these ranges may not be optimal: for r = 0, using the sharpened results from 
(jl999bl ). the range over which the memory parameter is asymptotically normal can be shown to 
be (5-3/4,(5 + 3/4). 

Remark 10. It is possible to replace the Gaussian assumption by the weaker assumption that 
the process X is a strong linear M{d) process by adding moment and regularity conditions on 
the distribution of the driving noise in the definition (|44|) . In this case however, the estimator 

^PH 

(m„) should be slightly modified to avoid a number of Fourier frequencies near 0. In addition, 
jooling are then required, even if the process X is stationary and invertible; see 



tapering and 



( Velasco 



2000, Theorem 3) and lFav et al.l (|2004l l. 



WAVELET ESTIMATORS OF LONG-MEMORY 



29 



8.2. Asymptotic properties of the LWF estimat or. The consiste ncy and asymptotic nor- 
mality of the LWF estimator have been established by iRobinsonI (|l995al ) for stationary invertible 
linear M{d) process —1/2 < d < 1/2 with no data taper (r = 0) and no differencing {5 = 0) 
(under the weaker assumption that {Zt} in (j44p are martingale differences, wh ose squa r es, cen - 



Velasco (1999a 



tered at their expectation, are also weakly stationary martingale differences), 
has shown that the LWF estimator with r = and 6 = was consistent for d £ (—1/2,1] an d 
asymptotically AA(0 , 1/4) for d G (—1/2,3/4) under the same assumptions than lRobinson ( 1995a ). 
(jHurvich and Chenl . 12000. Theorem 2) estab l ished Theorem [8] for r = 1 and 6 = 1. This result 



was later extended in 



Moulines and Soulierl (I2OO3I ) to general r and 6. The consistency of the 



LWF was es tablished (w i th 6 = and r = 0) for —1/2 < d < 1/2 for a general class of non-linear 



processes m 



Dalla et al, 



(120061) 



Theorem 8. Assume that X is a strong linear M(d) process for some d € M and f* G 'H*{P, 7, e) 
for some (3 G (0,2], 7 > 0, e G (0, vr]. Let 6 he the differencing order and r he the taper order. 
Let irin he a non decreasing sequence of integers such that 



lim 

n— »oo 



. 



Then, the LWF estimator defined in ()40p satisfies, for any d satisfying 

6-T-l/2<d<6 + l/2 , 



I / jLWF/ 



V 



d)^N{0,^T)/A) , 



(76) 

(77) 
(78) 



where ^{t) is defined in ()75p 



The quantity ^{t) quantifies the loss of efficiency due to tapering. As the taper order increases, 
the limiting variance inflates: $(0) = 1 (no tapering), $(1) = 1.5, <I>(2) = 35/18, etc. Since the 
LWF estimator is based on linear functionals of the periodogram, pooling is irrelevant. Recall 
that, in the definition (|10]) . the standard periodogram is used and not the pooled one. This is 
why the pooling order does not appear in the conditions of Theorem [8l 

Remark 11. The cond ition on the bandwidth (I76p is slightly weaker than Assumption A4' in 



Robinson (1995a) and 



Hurvich and Chen ( 2000 ) . It see ms that the log^(?n.„) term in these as- 



sumptions is superfluous (see (jAndrews and Sunl . |200J, Comments of Assumption 4) for the 
required adaptation of the proof). 

9. Discussion 

The Fourier and wavelet estimators of the memory parameter present similar characteristics and 
distinctive advantages. In Tabled we summarize the main features of the estimators considered 
in this paper. 
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V V CLV CldjO 


Non-stationarity 
{d large) 


Pre-apply (/ - B)^ 
with 5 > d-1/2 


take M>d 
(a sufficient number 

OT "^rfi Til c; n 1 n CT inoTTiP'nf t; i 

\JL V CXliiOliiliti liUJilldluO / 


Polynomial trends 


same as above 
with f) > K -i- ^ 


same as above 
with M > -1- 1 


Leakage 
(d small) 


Use taper (1 - e^i^'^/")^ 
with taper order p > 1/2 — 6? 


take a > (1 + /3)/2 - d 
(sufficiently 
smooth wavelet). 


Rate of convergence 
of d to d 


„/3/(l+2/3) < 2) 
with m„ X n^/(i+2/3) 


ni^/a+m (/J < 2) 

with 2^" X nV(i+2/3) 


Asymptotic 
variance 


depends on 
taper order p only; 
GPH's i LWF's, 
as pooling order — > oo. 


depends on d and ^; 
LRW's with w^v ^ LWW's 
> LRW's with w°P* ((?!),£). 



Table 2. Fourier VS wavelets: trends, non-stationarity, non-invertibility. In the 
Wavelets column, M and a are defined in (W-1) (W-4) 



To allow comparison between wavelet and Fourier estimators, we must first link the normal- 
ization factors, (n2~-^")-'^/2 for wavelet estimators and ml/'^ for Fourier estimators. A Fourier 
estimator with bandwidth m„ projects the observations [Xi . . . X^]'^ on the space generated by 
the vectors {cos(27rA; • /n), sin(27rA; • /n)}, k = 1, . . . ,m„, whose dimension is 2m„; on the other 
hand, the wavelet coefficients {Wj-.fc, j > L,k = 0, . . . ,nj — 1} used in the wavelet estimator corre- 
spond to a projection on a space whose dimension is at most X]/=L„ ""i ~ X]j^L„ n2~^ ~ 2n2"^". 
Hence, for m„ or n2~^" large, it makes sense to consider n2~^" as an analog of the bandwidth 
parameter m„. 

We shall now compare the asymptotic variances in the CLT's in Theorems El |H [7] and [8l 
While the asymptotic variance of the Fourier estimators is a constant, the variance of the wavelet 
estimators is a function of the memory parameter, which can be numerically computed. For the 
Fourier estimators, the allowed range of the memory parameter d is given by (see Theorems [7] 
and ED 

5-T-l/2<d<(5 + l/2. (79) 

The length of this range equals t + 1, while the differentiation order 5 allows to shift it towards 
large values of d. For instance, if one wishes to shift the upper boundary of the range towards 
large values of d while keeping the lower boundary unchanged, one has to increase both r and 5 
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by the same factor. As shown in Theorem [HI increasing r inflates the asymptotic variance of the 
estimator. For wavelet estimators, the allowed range of the memory parameter d is given by 

l/2-a<d<M , (80) 

[see Theorems [3] and [H here we took (3 arbitrarily small, since we focus on the asymptotic 
variance in this discussion]. Of course one may still shift this range by a factor 6 to the right by 
differentiating the series X at the order 5 before processing the wavelet transform. This will also 
eliminate polynomial trends up to degree M + 6 — 1. 

Observe that the higher the a in (180p . the more negative d is allowed to be. This is because 
the higher the a, the smoother the wavelet ipj^k and hence the better the spectral resolution of 
the wavelet. This matters particularly when d < because /(A) is then very small around the 
origin making it harder to estimate d. In Fourier (see (j79p . it is r that plays a role similar to a. 

It is important to note that, for a given wavelet family such as Daubechies and Coiflets, 
increasing M yields a larger a, so that the allowed range is effectively increased by increasing 
M. In contrast to Fourier methods, by increasing the number of vanishing moments M, say of a 
Daubechies wavelet, the asymptotic variance converges to the asymptotic variance obtained with 
the Shannon wavelet, presented in ([70]) . Thus, for a given d, there is asymptotically no price to 
pay for increasing the number of vanishing moments M and the number of available scales. This 
is an argument in favor of wavelet estimators as compared to tapered Fourier estimator. This 
should, however, be interpreted with care. For a given sample size n, an increase of M causes 
an increase of the support of the wavelet and a decrease in the number of available scales. While 
this does not influence the asymptotic variances, it affects the performance on finite samplej^. 

The plots in Figure [T] indicate that the asymptotic variance of the LWW estimator is lower 
than the one obtained using the tapered version of LWF estimator, for most values of the memory 
parameter and this difference increases as r increases in order to adapt to larger ranges for d. 
The asymptotic variance p'^{d,i) in (j66p of the LWW estimator is displayed for £ = 7 using the 
Daubechies wavelets with M = 2 (Left) to M = 4 (Right). For these choices of wavelets, the 
corresponding a's are 1.34 and 1.91, and the allowed intervals for d are [—0.84, 2] and [—1.41,4], 
respectively. The asymptotic variances p^{d, oo) in ()67p for M = 2 and 4 can also be compared 
to the one of the Shannon wavelet on this plot. The asymptotic variance $(t) in (I75p of the 
LWF estimator is constant in d but increases when the taper order r increases from r = 2 to 
r = 4, these values corresponding to intervals lengths for d close to those of the M = 2, 4 wavelet 
estimators (a bit larger for the former: 3 versus 2.84, and smaller for the latter: 5 versus 5.41). 



The same remark apply to the poohng order for the GPH estimator: the asymptotic variance decreases as 
p ^ oo but in practice, one takes smaU values, e.g. p = 3, 4. 
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Using wavelet to estimate the memory parameter has several additional benefits compared 
to using Fourier estimators. The wavelets present a rich time/frequency representation of the 
pr ocess, which can be m o re informati v e tha n tha t of the classical Four i er an alysis, as discussed 



m 



Serroukh et al 



(|200m 



Stoev et al. 



Percival and Walden 



(120061). Wavelets can be 



(120061 1 and 

used to detect the presence of outliers or jumps in the mean. The short-range dependence of the 
wavelet coefficients suggests construction of bootstrap confidence intervals for functionals of the 
wavelet coefficients, a procedure referred to as wavestrapping. This technique, which still is not 
rigorously justified, may b e used to const r uct b ootstrapped confidence interval for the memory 
parameter; see for example 



Percival et al. 



mod ). 



10. A Monte-Carlo study 

In this section, we present some Monte-Carlo simulation results that compare the root-mean 
square error performance of our four estimators for finite samples. The four estimators are denoted 
GPH (Geweke-Porter-Hudak), LWF (local Whittle Fourier), LWW (local Whittle wavelet) and 
LRW (local regression wavelet). We consider three models and several parameter combinations 
for each model: 



(1) The ARFIMA models, introduced bv [Granger and Joveuxl (jl98d ). and generalized here to 
any value of the memory parameter d. We considered the ARFIMA(0,(i,0) and ARFIMA(l,d,0) 
with d in { — 1.2, 0, 0.3, 1.5, 2.5, 3.5} and sizable lag 1 AR coefficient equal to 0.8. The in- 
novation is assumed to be Gaussian. The short-memory component /* of the spectral 
density satisfies /* G W(2,7,7r), w here H is defined in Definition [2j 

(2) The DARFIMA models, defined in 



Andrews and Sun 



^2004 ). is an ARFIMA-like process 



that has a discontinuity in its spectral density at frequency A = Aq- The DARFIMA(l,(i,0) 
has the spectral density of an ARFIMA(l,(i,0) on the interval [— Ao,Ao] and is zero for 
|A| G [Ao,vr]. It is obtained by low-pass filtering of an ARFIMA(l,(i,0) trajectory by a 
truncated sine function in the time domain. We chose Aq = vr/2 and Gaussian innovations. 
(3) The third model is a non-linear function of a Gaussian sequence: Xt = G{Yt) where 
{Yt} is a stationary Gaussian sequence with zero-mean and variance 1 and G : M ^ M 
is a measurable function such that E[G^(lo)] < oo. Then, Xt may be expressed as the 
sum Xt = Co + '^^=koi^k/ kl)Hk{Yt) , where Hk{-) is the k-th Hermite polynomial and 
Ck = K[G{Yt)Hk{Yt)]. The minimal integer /cq > 1 such that c^g 7^ is called the Hermite 



so an 



20061 . 



rank of G. If y is a M{d) process with memory parameter < 1/2, then X is a 
M[d) process with memory parameter dx = | (1 — A:o(l — 2(iy)) (see ( Dalla et al.l . 
p. 229, Eq. (55)) for details). In simulations, we use G{x) = exp(2;) (for which = 1) 
and G{x) = H2{x) = x'^ — 1 (for which ko = 2) and denote those models SUBORDl and 
SUB0RD2, respectively. 
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For the estimators LWW and LWF, we have used a convex minimization procedure of the contrast 
functions (j29p and (j39p . In all cases, 1000 simulation runs for each value of d are used. This 
produces simulation standard errors that are roughly 3%. 

The tuning parameters of each estimation procedure have been chosen to allow a fair compari- 
son of those methods in a realistic setting, where the order of magnitude of the memory parameter 
d is only loosely known and where one may be in the presence of high-order polynomial trends. In 
order to cover all the values of d above (—1.2 < d < 3.5), we have used a Daubechies wavelet with 
M = 4 vanishing moments for the wavelet estimators (hence a ~ 1.91, see Tabled]) and we have 
differenced the series (5 = 4 times and have used a taper order r = 5 for the Fourier estimators. 
The corresponding admissible ranges are (see (iHOl) and (f79]l ) (—1.41,4] and (—1.5,4.5), respec- 
tively. For the GPH estimator, we took p = 4 m Relation (j35p defining the pooled periodogram. 
This reduces the number of frequencies by a factor T+p = 9 (see Relation ([35]) ). In the case of the 
LRW estimator, the computation of the optimal weights in the least-square cr iterion is numer- 
i cally quite involved so we ran the simulations using the weights suggested by lAbry and Veitch 
()1998l ). The difference in the results becomes significant only when d gets close to the boundaries 
of the admissible range, see Figure [2] for the asymptotic variance. The remaining free parameters 
are the number of frequencies (LWF) or blocks of frequencies (GPH with pooling) denoted m„, 
and the minimal (i.e. finest) dyadic wavelet scale L„ (for the LRW and the LWW estimators); 
the highest (i.e. coarsest) scale is chosen to be the highest available (C/„ = J„). 

The box and whisker plots of the estimators are displayed in Figures U] and [5] for different 
values of the bandwidth (Fourier methods) and the finest scale (wavelet methods). In Figure UJ 
the model is an ARFIMA(l,(i,0) with d = 1.5. In Figure the model is an DARFIMA(l,(i,0) 
with d = 0.3. The AR coefficient is 0.8 in both cases. These figures illustrate the bias-variance 
trade-off inherent to semi-parametric methods (the variance decreases as the bandwidth or the 
number of scales increases, but then the bias increases). In general, the standard deviation of the 
1000 runs of the wavelet methods is comparable to that of the Fourier methods. 

Tables [3] and m give the bias, variance, and RMSE (root mean square error) for models 1,2,3 
for sample sizes 512 and 4096, respectively. Those quantities are computed for the optimal 
bandwidth run (Fourier methods) or the optimal finest scale L„ (wavelet methods) in the RMSE- 
sense, whose values are displayed in the fourth column. For each model, the lowest RMSE among 
the four methods appears in boldface. Note that all the possible values of finest scale L„ have 
been considered, but only a subset of the many possible values of the bandwidth m„. The 
standard deviations of the LWW estimator are lower than those of the LRW estimator, which 
is consistent with our theoretical findings. Also, the standard deviations of the Fourier methods 
remain approximately constant for the different values of the memory parameter, whereas the 
variance of the wavelet methods increase with \d\. Also, as predicted by the expressions of 
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the limiting variance, the variance of the wavelet methods are lower than those of the Fourier 
methods, especially when d is small. The reported values of the standard deviations agree with 
our theoretical findings for the sample size n = 4096. 

For the non- linear processes, the results suggest that the wavelet estimator remains consistent. 
However, the presence of non-linearity worsens the behavior of the estimator at a given finite 
sample and a larger sample size is required to achieve a prescribed accuracy. 

The root mean square error is shown in some particular cases in Figure [H The MSE of the 
Fourier criteria is plotted against the value of the bandwidth, that is, m,„ for the LWF estimator 
and run x (p + r) for the GPH estimator. For the wavelet methods, the somehow arbitrary 
"equivalent bandwidth" abscissa is half the number of wavelet coefficients used by the estimators: 
^equiv _ 1 ^^^^ (ggg the discussion on the comparison of the asymptotic variances in the 
previous section). 

11. Software 

The software used to perform the estimation of the long-memory parameter was written in 
Matlab/Octave and may be obtained from the authors. It includes the four estimators (LWF, 
GPH, LWWand LRW), basic random processes generators and some other utilities such as the 
pyramidal algorithm for computing wavelet coefficient. 

Basic installation. After downloading the tar archive (toolboxLRD.tar) and expanding it 
in e.g. /home/user/octave, one has to add the directory to the search path: 
addpath(genpath( ' /home/user/octave/ToolboxLRD ' ) ) ; 

This line can be added to the . octave or . matlab file. Some demos are available in ToolboxLRD/Examples. 

Loading the data. Use the load command to load a data set (time series) into some vector, 
say X. One can also synthesize some trajectories using one's personal generator or the one 
present in the Utils subdirectory. For instance, a 4096 long trajectory of the ARFIMA model 
(1 — B)'^{Xt — aXt-i) = Zt with d = 1.4, a = 0.8 and Gaussian i.i.d sequence Z can be obtained 
by setting: 

n = 4096; 

X = randARFIMA(1.4, [0.8] , [] ,n) ; 

The argument [] above means that the MA part of the generated ARFIMA is a weak white noise 
(MA(0)). This example is used in the following to describe the package. 

Estimating the long-memory parameter. We shall now obtain the LRW, LWW, GPH 
and LWF estimators of the memory parameter d of the series as well as an estimate of their 
standard deviation using the asymptotic variance given in Theorems El [H [71 and [HI The standard 
deviations can be used to build asymptotic confidence intervals. 

1. The Geweke-Porter Hudak (GPH) estimator is obtained as follows: 
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param.taper=5; parain.pooling= 4; 

param.baiidwidth=[6 12 24 50]; param.diff order = 4; 
[d, stds]=GPH(x,param) 

where param . bandwidth is a vector giving the different values for the upper Fourier fre- 
quency m on which the regression is to be performed (Theorem [7]). The taper order r, 
poohng order p and differentiation order 5 are specified by param. taper, param. pooling 

and param. diff order, respectively. One obtains 
d = 

1.2744 1.2736 1.3670 1.3930 
stds = 

0.1803 0.1215 0.0840 0.0576 

Note that d and stds are vectors. In the above example, they have four components, corre- 
sponding respectively to the bandwidths m = 6, 12, 24, 50. 
2. The Local Whittle Fourier (LWF) estimator is invoked in the following way: 

par am. taper =5; param.diff order=4; 
param.baiidwidth=[50 100 200 500]; 
[d, stds] =LWF(x, param, [] ) 

One gets : 

d = 

1.3114 1.2974 1.2905 1.3422 
stds = 

0.1206 0.0853 0.0603 0.0381 



Here the minimization of (j39p is done over the whole real line. As for the LWW function, one 
may specify the range where to optimize the LWF contrast function (j39p by replacing the third 
argument [] by an interval [Ai, A2]. For instance, if one wants to restrict this minimization 
to the set ()79p of admissible values of d for the CLT Theorem [8] to hold, 

range= [ param.diff order-param.taper-0. 5, param. diff order+0 . 5] ; 
[d, stds] =LWF(x, param, range) 

In this specific case, the output is unchanged since the minimizing values of d are within the 
corresponding interval [—1.5,4.5]. 
3. The Local Whittle Wavelets (LWW) estimator is obtained as follows: 

LU = [6 9; 5 9; 4 9; 3 9] ; 

[phi, M, alpha] = scalingf ilter ( 'Daubechies ' ,4) ; 
[d, stds] = LWW(x,LU,phi, []) 

where phi indicates the scaling function, M the corresponding number of vanishing moments. 



alpha the Fourier decay exponent (see (W-1) (W-4)), x contains a finite set of observations 
and LU is a two column matrix, containing scales limits L and U in the LWW objective 
function ()29p . If LU is a one column vector then it contains different values of the lower scale 
L and U is taken equal to the maximal available scale index J defined in ()22p . The argument 
[] above means that the interval, denoted [Ai, A2] in Definition ()28p . over which the contrast 
function (j29p is minimized is (—00, 00). It can be replaced by an interval [Ai, A2], if one wants 
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to restrict the minimization to a particular range, for instance to the one given by (jSOp which 

corresponds to admissible values of d for the CLT Theorem [J] to hold. One gets : 
d = 

1.3183 1.3366 1.4209 1.3788 
stds = 

0.1138 0.0719 0.0479 0.0324 

4. The Local Regression Wavelet (LRW) estimator and the standard deviation is invoked in the 
following way: 

L = [6;5;4;3] ; 

[d.stds] = LRW(x,L,phi) 

The three first argument of LRW are the same as LWW but LU has been replaced by L, a column 

vector containing different choices for the lower scale L used in the regression, see Eq. (j24p . In 

this case, the upper scale is the largest scale available {U = Jn)- If one wants to take different 

values for U, a two-columns matrix must replace L, for example, by the LU in the LWW case. 

Here the LRW estimator is computed using Abry-Veitch weights (see [63|) and one gets the 

output: 
d = 

1.4161 1.3815 1.4305 1.3882 
stds = 

0.1020 0.0675 0.0461 0.0317 

But the LRW estimator can also be obtained using the optimal weights. In fact, the following 
additional output variables are available : 

(a) the value of a log-regression multiplicative constant c so that ct| w c 2^^^^ . Equivalently, 
log dj « log c + 2dj, where log c is the intercept of the regression line; 

(b) new estimates of d using the two-step procedure based on the optimal weights ([6T]) . These 
weights are computed using the preliminary value of d estimated with the Abry-Veitch 
weights; 

(c) the standard deviations of the new estimates of d; 

(d) the corresponding values of the log-regression multiplicative constant. 
Thus if the LRW call of the last example is replaced by 

[d.stds, c, dopt, stdopt, copt] = LRW(x,L,phi) 

the following additional output is added to the previous one : 



c = 














0.0074 


0, 


,0105 


0, 


,0067 


0, 


,0094 


dopt = 

1.4199 


1, 


,3852 


1, 


,4451 


1, 


,3834 


stdopt = 
0.1011 


0, 


,0666 


0, 


,0453 


0, 


,0311 


copt = 

0.0073 


0, 


,0103 


0, 


,0060 


0, 


,0097 



Alternatively, one could also use a different preliminary estimate of d, say d = 1.3823, a 
value obtained as the first output of the GPH routine above : 

[d.stds, c, dopt, stdopt, copt] = LRW (x,L, phi , 1 . 3823) 
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The last three output values are then replaced by 
dopt = 

1.4199 1.3852 1.4441 1.3835 
stdopt = 

0.1011 0.0666 0.0451 0.0310 
copt = 

0.0073 0.0103 0.0061 0.0097 

Obtaining confidence intervals. A routine for obtaining the asymptotically normal confi- 
dence intervals for d at a given level has also been included. It works as follows : 

p=0 . 95 ; d=dopt ; stds=stdopt ; 

[I] =Conf idencelntervaKd, stds ,p) 

where d and stds are any outputs of the above procedures and p is the confidence level. Here we 

used the last displayed estimates with p = 0.95 and get 
I = 

1.2217 1.6180 
1.2546 1.5158 
1.3557 1.5325 
1.3227 1.4443 

In this particular example we can see that the true value of d, namely 1.4, belongs to the four 
intervals. 

Obtaining the theoretical asymptotic variances. It is possible also to obtain directly the 
asymptotic variances Popt(^5^) P^{d^tj defined in (j62p and (j66p . These are the asymptotic 
variances of the LRW estimator (Theorem [3]), when, respectively, the optimal weigths (|6ip are 
chosen or when the Abry-Veitch weights (|63p are chosen. The asymptotic variance of the LWW 
estimator is also p'^{d,£) (Theorem H]). The approximation of p'^{d,i) given in (j70p and obtained 
by replacing ^ by the Shannon wavelet is also available. This is how to get these asymptotic 
variances : 

d=1.4; 1=5; 

[v, vs, vopt , wopt]= AsymptoticVariaiiceLRW(phi,d,l) 

Here p^{d,t) and p'^^^^{d,tj are computed for d = 1.4 and I = b and stacked in the output v and 
vopt respectively. The output vs corresponds to the Shannon approximation (|7Up and wopt to 
the optimal weights ()6ip . of length ^ + 1 = Q. For these values one gets 

V = 

. 5848 

vs = 

. 4949 
vopt = 

0.5698 
wopt = 

-0.2693 0.0546 0.0827 0.0587 0.0410 0.0322 

We observe that for this value of d the optimal variance (0.5698) is sensitively lower than the 
one obtained with Abry-Veitch weights (0.5848), but still larger than the Shannon approximation 
(0.4949). Since this approximation gets sharper as the number of vanishing moments of the 
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Daubechies wavelet increases, it indicates that one could get a better variance by increasing M, 
here M = 2. Notice, however, that the length of the wavelet filters would also increase and 
thus the number of available wavelet coefficients decrease for a finite n, an effect which is not 
considered in the asymptotic variance, see Section [9l 

12. Conclusion 

We have compared four semi-parametric methods for the estimation of the long-memory pa- 
rameter d in times series, two Fourier-based and two wavelet-based. These are the Geweke- 
Porter Hudak (GPH) [Regression/Fourier] , Local Whittle Fourier (LRW) [Whittle/Fourier], Lo- 
cal Regression Wavelet (LRW) [Regression/ Wavelets] and Local Whittle Wavelet (LWW) [Whit- 
tle/Wavelets]. We have discussed issues related to differencing, tapering and pooling in the case 
of Fourier-based estimators and choices of wavelets in the case of wavelet-based estimators. Con- 
ditions for the asymptotic normality of the estimators are specified in Theorems El HI [7] and [8l 

We have undertaken a Monte Carlo comparison. In the Monte Carlo study, we have focused on 
ARFIMA(0,d,0) and ARFIMA(l,(i,0) models with an AR(1) parameter equal to 0.8, a relatively 
high value, as well as on DARFIMA and subordinated models defined in Section [TOj All four 
methods appear to work well with similar performances at the optimal bandwidth lower scale 
index. We have also developed a software package for the benefit of the practitioner which 
computes the corresponding estimates of the long-memory parameter d and provides confidence 
intervals, based on the asymptotically normal distribution of the estimators. 

We noted that the LRW estimator with Abry-Veitch weights (j63p has the same asymptotic 
variance as the LWW estimator. This means that the LRW estimator, when used with the 
optimal weights ()6ip . has smaller asymptotic variance than the LWW estimator. 
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Model 



bias 



GPH 

std RMSE m°P' 



bias 



LWF 

std RMSE m°P' 



bias 



LRW 

std RMSE 



bias 



LWW 

std RMSE L°P' 



ARFIMA(0,-1.2,0) 



0.007 0.105 0.105 



26 



-0.108 0.071 0.129 



234 



0.047 0.106 0.116 



0.105 0.083 0.134 



ARFIMA(1,-1.2,0) 



0.000 0.161 0.161 



12 



-0.138 0.128 0.188 



72 



-0.093 0.108 0.142 



-0.048 0.083 0.096 



ARFIMA(0,0.0,0) 



-0.022 0.103 0.105 



26 



-0.099 0.073 0.123 



234 



-0.026 0.058 0.064 



-0.002 0.046 0.046 



ARFIMA(1,0.0,0) 



-0.073 0.154 0.170 



12 



-0.175 0.134 0.220 



72 



-0.169 0.104 0.198 



-0.058 0.143 0.154 



ARFIMA(0,0.3,0) 



-0.029 0.104 0.108 



26 



-0.094 0.071 0.118 



234 



-0.065 0.060 0.088 



-0.045 0.046 0.065 



ARFIMA(1,0.3,0) 



-0.076 0.151 0.169 



12 



-0.194 0.105 0.221 



108 



-0.172 0.100 0.199 



-0.060 0.143 0.154 



ARFIMA(0,1.5,0) 



-0.049 0.097 0.109 



26 



-0.077 0.072 0.105 



234 



-0.085 0.110 0.139 



-0.045 0.093 0.103 



ARFIMA(1, 1.5,0) 



-0.121 0.148 0.190 



12 



-0.182 0.106 0.210 



108 



-0.167 0.115 0.203 



-0.135 0.091 0.163 



ARFIMA(0,2.5,0) 



-0.039 0.094 0.102 



26 



-0.050 0.072 0.087 



234 



-0.093 0.120 0.152 



-0.047 0.097 0.108 



ARFIMA(1,2.5,0) 



-0.132 0.141 0.194 



12 



-0.157 0.108 0.190 



108 



-0.136 0.115 0.178 



-0.101 0.098 0.141 



ARFIMA(0,3.5,0) 



-0.023 0.092 0.095 



26 



-0.023 0.070 0.074 



234 



-0.077 0.110 0.134 



-0.037 0.089 0.097 



ARFIMA(1,3.5,0) 



-0.097 0.136 0.167 



12 



-0.111 0.109 0.155 



108 



-0.102 0.113 0.152 



-0.063 0.097 0.116 



DARFIMA(0,0.0,0) 



0.064 0.275 0.282 



0.002 0.162 0.162 



72 



-0.013 0.188 0.188 



0.072 0.139 0.157 



DARFIMA(1,0.0,0) 



0.031 0.287 0.288 



-0.027 0.155 0.157 



72 



-0.038 0.187 0.191 



0.041 0.144 0.149 



DARFIMA(0,0.3,0) 



0.036 0.282 0.284 



0.005 0.161 0.161 



72 



-0.038 0.190 0.193 



0.046 0.145 0.152 



DARFIMA(1,0.3,0) 



0.015 0.266 0.266 



-0.030 0.154 0.157 



72 



-0.064 0.182 0.193 



0.016 0.146 0.147 



SUBORD1(0,0.0,0) 



-0.018 0.099 0.101 



26 



-0.095 0.066 0.116 



234 



-0.032 0.069 0.076 



-0.003 0.057 0.057 



SUBORD1{1,0.0,0) 



0.121 0.114 0.166 



26 



0.006 0.075 0.076 



234 



-0.043 0.097 0.106 



-0.015 0.086 0.087 



SUBORD1{0,0.3,0) 



-0.111 0.113 0.159 



26 



-0.179 0.087 0.199 



234 



-0.155 0.090 0.179 



-0.127 0.083 0.152 



SUBORD1{1,0.3,0) 



-0.063 0.161 0.173 



17 



-0.133 0.157 0.206 



153 



-0.122 0.174 0.212 



-0.032 0.185 0.188 



SUBORD2(0,0.0,0) 



-0.014 0.103 0.104 



26 



-0.095 0.067 0.117 



234 



-0.026 0.061 0.066 



-0.001 0.048 0.048 



SUBORD2(1,0.0,0) 



0.207 0.260 0.332 



0.022 0.183 0.184 



45 



0.160 0.212 0.266 



0.293 0.197 0.353 



SUBORD2(0,0.3,0) 



0.014 0.115 0.116 



26 



-0.058 0.093 0.109 



234 



-0.005 0.076 0.077 



0.025 0.067 0.072 



0.036 0.168 0.172 



72 



0.174 0.088 0.195 



0.214 0.085 0.230 



SUBORD2{1,0.3,0) 0.157 0.209 0.262 

Table 3. Length of the time series = 512. Bias, standard deviation, root mean-square error and optimal 
bandwidth/minimal scale for the four estimators applied to the three models of Section [TOl The lowest RMSE 
among the four methods appears in boldface. 



Model 



bias 



GPH 

std RMSE 



bias 



LWF 

std RMSE 



bias 



LRW 

std RMSE L°P' 



bias 



LWW 

std RMSE 



ARFIMA(0,-1.2,0) 



-0.012 0.032 0.034 



224 



-0.031 0.022 0.038 



2016 



0.020 0.037 0.043 



0.038 0.032 0.050 



ARFIMA(1,-1.2,0) 



-0.024 0.061 0.065 



54 



-0.081 0.041 0.091 



486 



-0.046 0.023 0.051 



-0.037 0.021 0.043 



ARFIMA(0,0.0,0) 



-0.016 0.031 0.035 



224 



-0.027 0.022 0.035 2016 



-0.006 0.014 0.015 



0.000 0.012 0.012 



ARFIMA(1,0.0,0) 



-0.037 0.060 0.070 



54 



-0.072 0.043 0.084 



486 



-0.042 0.034 0.055 



-0.031 0.029 0.043 



ARFIMA(0,0.3,0) 



-0.017 0.031 0.036 



224 



-0.026 0.023 0.034 



2016 



-0.019 0.022 0.029 



-0.010 0.019 0.021 



ARFIMA(1,0.3,0) 



-0.044 0.062 0.076 



54 



-0.071 0.043 0.083 



486 



-0.042 0.036 0.056 



-0.029 0.030 0.041 
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o 



ARFIMA(0, 1.5,0) 



-0.017 0.031 0.035 



224 



-0.021 0.021 0.029 2016 



-0.038 0.026 0.046 



-0.028 0.024 0.037 



ARFIMA(1, 1.5,0) 



-0.051 0.057 0.077 



54 



-0.062 0.041 0.074 



486 



-0.038 0.042 0.057 



-0.022 0.037 0.043 



ARFIMA(0,2.5,0) 



-0.013 0.030 0.033 



224 



-0.014 0.022 0.026 2016 



-0.040 0.030 0.050 



-0.029 0.027 0.040 



ARFIMA(1,2.5,0) 



-0.043 0.058 0.072 



54 



-0.047 0.042 0.063 



486 



-0.035 0.044 0.056 



-0.018 0.039 0.043 



ARFIMA(0,3.5,0) 



-0.005 0.030 0.031 



224 



-0.006 0.022 0.023 2016 



-0.033 0.028 0.043 



-0.019 0.027 0.034 



ARFIMA(1,3.5,0) 



-0.033 0.058 0.066 



54 



-0.035 0.044 0.056 



486 



-0.029 0.044 0.053 



-0.014 0.041 0.043 



DARFIMA(0,0.0,0) 



-0.023 0.060 0.064 



54 



-0.055 0.042 0.069 



486 



0.025 0.035 0.043 



-0.002 0.044 0.044 



DARFIMA(1,0.0,0) 



-0.037 0.060 0.070 



54 



-0.072 0.043 0.084 



486 



0.010 0.034 0.035 



0.029 0.029 0.041 



DARFIMA(0,0.3,0) 



-0.024 0.063 0.068 



54 



-0.053 0.044 0.069 



486 



0.012 0.036 0.038 



0.033 0.030 0.044 



DARFIMA(1,0.3,0) 



-0.044 0.062 0.076 



54 



-0.071 0.043 0.083 



486 



-0.003 0.037 0.037 



0.016 0.030 0.034 



SUBORD1(0,0.0,0) 



-0.015 0.030 0.033 



224 



-0.027 0.021 0.034 



2016 



-0.006 0.014 0.016 



0.000 0.013 0.013 



SUBORDl(l, 0.0,0) 



0.028 0.089 0.093 



26 



0.032 0.028 0.043 2016 



0.034 0.026 0.042 



0.044 0.026 0.051 



SUBORD1(0,0.3,0) 



-0.103 0.043 0.112 



224 



-0.115 0.038 0.121 



2016 



-0.103 0.038 0.109 



-0.082 0.051 0.096 



SUBORDl(l, 0.3,0) 



-0.091 0.073 0.117 



110 



-0.086 0.076 0.115 



990 



-0.093 0.057 0.109 



-0.060 0.066 0.089 



SUBORD2(0,0.0,0) 



-0.018 0.031 0.036 



224 



-0.028 0.022 0.036 



2016 



-0.006 0.014 0.016 



-0.001 0.013 0.013 



SUBORD2(l, 0.0,0) 



0.043 0.096 0.106 



26 



-0.008 0.071 0.072 



234 



0.054 0.063 0.083 



0.031 0.090 0.095 



SUBORD2(0,0.3,0) 



0.032 0.045 0.055 



224 



0.022 0.040 0.046 



2016 



0.028 0.025 0.037 



1 



0.035 0.024 0.042 



SUBORD2(l, 0.3,0) 



-0.013 



0.079 0.080 

Table 4. 



37 

Same 



0.043 0.061 

as Tabled 



0.074 
with len 



486 
gth of 



-0.045 0.063 0.078 4 

the time series = 4096. 



-0.006 0.054 0.055 
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Figure 1 . Comparison of the asymptotic variances of LWF and LWW estimators 
as functions of d. The dot /dash hne displays the variance ()75p of the LWF estima- 
tor with taper order r; the plain curve displays the variance p^{d,l) in (j66p with 
I = 7 ol the LWW estimator using Daubechies wavelets of order M; the dotted 
curve displays the variance (j67p of the LWW estimator using the ideal Shannon 
wavelet. Left panel: r = 2 for the LWF, M = 2 for the LWW. Right panel: r = 4 
for the LWF, M = 4 for LWW. 
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Figure 2. Comparison of the asymptotic variance of the LRW estimator using 
Abry-Veitch weights given by (I63p with the LRW estimator using optimal weights 
given by (f6T]) . We plot p^{d,t} — plpt{d,i) as a function of d with i = 7. We used 
Daubechies wavelets for two different values for M. Left panel: M = 2. Right 
panel: M = 4. 
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Figure 3. Comparison of the asymptotic variance (j66p of the LRW estimator 
using Abry-Veitch weights given by (j63p with the one of LRW estimator using 
optimal weights given by ([6T]) . We plot p^p^ {d, t) as a function of d for successive 
values of ^ = 1,3,5,7,9 (from top to bottom). We used Daubechies wavelets for 
two different values for M. Left panel: M = 2. Right panel: M = 4. 
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